The robots.txt file is a very powerful file if you’re working on a site’s SEO. At the same time, it also has to be used with care. It allows you to deny search engines access to certain files and folders, but that’s very often not what you want to do. Over the years, especially Google changed a lot in how it crawls the web, so old best practices are no longer valid. This post explains what the new best practices are for your WordPress robots.txt and why.
Google fully renders your site
The old best practices of having a
robots.txt that blocks access to your
wp-includes directory and your plugins directory are no longer valid. This is why, in WordPress 4.0, I opened the issue and wrote the patch to remove
wp-includes/.* from the default WordPress
admin-ajax.php URL in
wp-admin. This was fixed in WordPress 4.4.
Robots.txt denies links their value
Something else is very important to keep in mind. If you block a URL with your site’s
robots.txt, search engines will not crawl those pages. This also means that they cannot distribute the link value pointing at those URLs. So if you have a section of your site that you’d rather not have showing in the search results, but does get a lot of links, don’t use the
robots.txt file. Instead, use a robots meta tag with a value
noindex, follow. This allows search engines to properly distribute the link value for those pages across your site.