There have been some questions after the release last weekend of the new XML sitemaps implementation in WordPress SEO. Let me try to address most of those questions in one post as well as explain the ideas behind it.
The basic idea behind the rebuild of the XML sitemap functionality was simple: XML sitemaps needed to be more scalable, have a better API and cause less issues. Most of the issues were caused by two causes:
- the fact that I was writing XML sitemaps to disk as static files;
- the fact that I tried to generate one huge sitemap to catch everything.
Both of these were not really design decisions, just how it “turned out” when I had built the functionality and they turned out to be stupid. So, with the help of the incredibly talented Jon Cave, a WordPress core dev from the UK, the XML sitemap functionality has been rebuilt. It now has an index sitemap file that points to several sitemaps, one for each post type and taxonomy.
If you have more than 1,000 posts in a post type, for instance more than 1,000 posts, it will automatically start splitting these sitemaps up into 1,000 posts sitemaps, so as to reduce the load time per sitemap. These sitemaps are generated in a completely new way, on the fly, without caching them to disk. This means there won’t be a delay anymore (caused by the generation process) when you publish a new post.
The sitemap index file can always be found at sitemap_index.xml, a link to that can be found from the new XML sitemaps menu.
The new code also means there’s a great new API for XML sitemaps, which will first be used in the Google News module. Documentation for that is coming. Because of a new filter in the code, NextGen Gallery will now also be able to add its images into the sitemap, so all of your images can be found by Google.
FAQ regarding the new XML Sitemaps functionality
- Why do I get a 404 when opening the sitemap_index.xml file?
You probably have W3 Total Cache active, that is preventing 404 errors for static files to go through to WordPress. Please switch to a W3 Total Cache alternative.
- Why do I get an error saying “not a valid feed template”?
You’re probably using a Woothemes tumble-style theme, check with Woothemes about an update, they’re using a very general rule for XML sitemaps in their old code. Or, if you’re a developer, go and find a file that’ll probably be called /includes/tumblog/theme-tumblog.phpand fix / remove the line that looks like this:
'(.+).xml' => 'index.php?feed='. $wp_rewrite->preg_index(1)
- Why does my sitemap show as type “images” in GWT?
Because images are contained within the sitemap, and Google Webmaster Tools then immediately shows it as type “images”. There is no “midex” type in Google Webmaster Tools at the moment.
- Which sitemap should I submit to Google Webmaster Tools?
The index file, sitemap_index.xml, should be submitted to Google Webmaster Tools automatically if you have the ping setting for Google on. You’ll then find the sitemap under the “All” links, in the right hand side on GWT:
You’ll then be able to click on the sitemap_index.xml file and see the sub sitemaps and indexation per sub sitemap.
- So there’s no sitemap.xml file anymore?
No, correct. This change was made to prevent collisions with other sitemaps, something that happened quite regularly. You can safely remove it from Google Webmaster Tools and other search engine portals.
- What to do with the wp-content/uploads/wpseo/ folder?
You can safely remove it, as said the plugin no longer writes static files.
- The plugin doesn’t add anything to the .htaccess file?
No it doesn’t, the rewrites for the files are handled with WordPress internals.
- The plugin doesn’t add the sitemap to the robots.txt file!
No it doesn’t. There’s no real good reason to do that when you’re pinging the search engine of changes with the sitemap. In my experience it only helps scrapers find new content on your site. The fact that another XML sitemap plugin does do that doesn’t make it better nor a requirement.
- Is there a .gz version of the sitemaps file?
No, but if you run a caching plugin most of the times the output will be gzipped anyway.