noindex a post with meta robots noindex

WordPress robots.txt
An example for great SEO

WordPress robots.txt example for great SEO

The robots.txt file is a very powerful tool when you’re working on a website’s SEO – but it should be handled with care. It allows you to deny search engines access to different files and folders, but often that’s not what you want to do these days. Over the years Google especially has changed a lot in how it crawls the web, so often what used to be best practice a few years ago doesn’t work anymore. This post outlines current best practice for your WordPress robots.txt file and explains why you should adopt it.

Google now fully renders your site

No longer is Google the dumb little kid who only fetches your site’s HTML while ignoring your styling and JavaScript. It fetches everything and renders your pages completely. So google doesn’t like it at all when you deny it access to your CSS or JavaScript files – this post about Google Panda 4 gives an example of why that is. We warned against it again in another post and will keep on saying it: don’t ever block your CSS and JavaScript files.

Previous best practice of blocking access to your wp-includes directory and your plugins directory via robots.txt is no longer valid, which is why, in WordPress 4.0, I opened the issue and wrote the patch to remove wp-includes/.* from the default WordPress robots.txt.

Many themes also use asynchronous JavaScript requests – so-called AJAX – to add content to web pages. WordPress used to block Google from this by default, so I created a ticket to update WordPress Core to allow Google to crawl the admin-ajax.php URL in wp-admin. This was fixed in WordPress 4.4.

Robots.txt denies links their value

There’s something else that’s very important to remember: if you use your site’s robots.txt to block a URL, search engines won’t crawl it. This also means that they can’t distribute the link value pointing at blocked URLs. So if there’s an area of your site that has a lot of links pointing at it but you’d rather not have appear in search results, don’t block it via robots.txt, use a robots meta tag with a value of noindex, follow instead. This allows search engines to properly distribute the link value for those pages across your site.

Want to bump your SEO to a higher level? Become a technical SEO expert with our Technical SEO training! »

Technical SEO training Info

Our WordPress robots.txt example

So, what should be in your WordPress robots.txt? Ours is very clean now – we block almost nothing! This means we don’t block our /wp-content/plugins/ directory, as plugins might output JavaScript or CSS that Google needs to render the page. And we also don’t block our /wp-includes/ directory, as the default JavaScript that comes with WordPress, which many themes use, lives there.

We also don’t block our /wp-admin/ folder. The reason is simple: if you block it, but link to it somewhere by chance, people will still be able to do a simple [inurl:wp-admin] query in Google and find your site – just the kind of query malicious hackers love to do. Now, WordPress has (by my doing) a robots meta x-http header on the admin pages that prevents search engines from displaying them in search results, which is a much cleaner solution. However, we do block our Yoast Suggest tool, because the dynamic results this creates once opened a spider trap.

What you should do with your robots.txt

Log into Google Search Console and under Crawl → Fetch as Google, use the Fetch and Render option:

Fetch as Google in Google Search console, test your WordPress robots.txt

If it doesn’t look the same as when viewing your site in a browser, or it throws errors or notices, fix them by removing the lines in your robots.txt file that block access to the URLs identified in the notices.

Should you link to your XML Sitemap from your robots.txt?

We’ve always felt it pointless to link to your XML sitemap from your robots.txt file, because you should add your sitemap manually to your Google Search Console and Bing Webmaster Tools accounts and look at their feedback about it. This is why our Yoast SEO plugin doesn’t add it to your robots.txt. Don’t rely on search engines finding out about your XML sitemap through your robots.txt .

Read more: robots.txt: the ultimate guide »



Check out our must read articles about WordPress