Your robots.txt file is a powerful tool when you’re working on a website’s SEO – but it should be handled with care. It allows you to deny search engines access to different files and folders, but often that’s not the best way to optimize your site.
What does “best practice” look like?
Search engines continually improve the way in which they crawl the web and index content. That means what used to be best practice a few years ago doesn’t work anymore, or, may even harm your site.
Today, best practice means relying on your robots.txt file as little as possible. In fact, it’s only really necessary to block URLs in your robots.txt file when you have complex technical challenges (e.g., a large eCommerce website with faceted navigation), or when there’s no other option.
Blocking URLs via robots.txt is a ‘brute force’ approach, and can cause more problems than it solves.
For most sites, the following is best practice:
# This space intentionally left blank # If you want to learn about why our robots.txt looks like this, read this post: https://yoa.st/robots-txt User-agent: *
We even use this approach in our own robots.txt file.
What does this code do?
User-agent: *instruction states that any following instructions apply to all crawlers.
- Because we don’t provide any further instructions, we’re saying “all crawlers can freely crawl this site without restriction”.
- We also provide some information for humans looking at the file (linking to this very page), so that they understand why the file is ’empty’.
If you have to disallow URLs
If you want to prevent search engines from crawling or indexing certain parts of your site, it’s almost always better to do so by adding meta robots tags or robots HTTP headers.
Our ultimate guide to meta robots tags explains how you can manage crawling and indexing ‘the right way’, and our Yoast SEO plugin provides the tools to help you implement those tags on your pages.
If your site has crawling or indexing challenges which can’t be fixed via meta robots tags or HTTP headers, or if you need to prevent crawler access for other reasons, you should read our ultimate guide to robots.txt.
Note that WordPress and Yoast SEO already automatically prevent indexing of some sensitive files and URLs, like your WordPress admin area (via a x-robots HTTP header).
Why is this ‘minimalism’ best practice?
Robots.txt creates dead ends
Before you can compete for visibility in the search results, search engines need to discover, crawl and index your pages. If you’ve blocked certain URLs via robots.txt, search engines can no longer crawl through those pages to discover others. That might mean that key pages don’t get discovered.
Robots.txt denies links their value
One of the basic rules of SEO is that links from other pages can influence your performance. If a URL is blocked, not only won’t search engines crawl it, but they also might not distribute any ‘link value’ pointing to that URL to, or through that URL to other pages on the site.
Google fully renders your site
Previous best practice of blocking access to your
wp-includes directory and your plugins directory via
robots.txt is no longer valid, which is why we worked with WordPress to remove the default disallow rule for
wp-includes in version 4.0.
You (usually) don’t need to link to your sitemap
The robots.txt standard supports adding a link to your XML sitemap(s) to the file. This helps search engines to discover the location and contents of your site.
We’ve always felt that this was redundant; you should already by adding your sitemap to your Google Search Console and Bing Webmaster Tools accounts in order to access analytics and performance data. If you’ve done that, then you don’t need the reference in your robots.txt file.