The robots.txt file is a very powerful tool when you’re working on a website’s SEO – but it should be handled with care. It allows you to deny search engines access to different files and folders, but often that’s not what you want to do these days. Over the years Google especially has changed a lot in how it crawls the web, so often what used to be best practice a few years ago doesn’t work anymore. This post outlines current best practice for your WordPress robots.txt file and explains why you should adopt it.
Google now fully renders your site
Previous best practice of blocking access to your
wp-includes directory and your plugins directory via
robots.txt is no longer valid, which is why, in WordPress 4.0, I opened the issue and wrote the patch to remove
wp-includes/.* from the default WordPress
admin-ajax.php URL in
wp-admin. This was fixed in WordPress 4.4.
Robots.txt denies links their value
There’s something else that’s very important to remember: if you use your site’s
robots.txt to block a URL, search engines won’t crawl it. This also means that they can’t distribute the link value pointing at blocked URLs. So if there’s an area of your site that has a lot of links pointing at it but you’d rather not have appear in search results, don’t block it via
robots.txt, use a robots meta tag with a value of
noindex, follow instead. This allows search engines to properly distribute the link value for those pages across your site.
Our WordPress robots.txt example
So, what should be in your WordPress robots.txt? Ours is very clean now – we block almost nothing! This means we don’t block our
We also don’t block our
/wp-admin/ folder. The reason is simple: if you block it, but link to it somewhere by chance, people will still be able to do a simple
[inurl:wp-admin] query in Google and find your site – just the kind of query malicious hackers love to do. Now, WordPress has (by my doing) a robots meta x-http header on the admin pages that prevents search engines from displaying them in search results, which is a much cleaner solution. However, we do block our Yoast Suggest tool, because the dynamic results this creates once opened a spider trap.
What you should do with your
Log into Google Search Console and under Crawl → Fetch as Google, use the Fetch and Render option:
If it doesn’t look the same as when viewing your site in a browser, or it throws errors or notices, fix them by removing the lines in your
robots.txt file that block access to the URLs identified in the notices.
Should you link to your XML Sitemap from your
We’ve always felt it pointless to link to your XML sitemap from your
robots.txt file, because you should add your sitemap manually to your Google Search Console and Bing Webmaster Tools accounts and look at their feedback about it. This is why our Yoast SEO plugin doesn’t add it to your
robots.txt. Don’t rely on search engines finding out about your XML sitemap through your
Read more: robots.txt: the ultimate guide »