Your robots.txt file is a powerful tool when working on a website’s SEO – but you should handle it with care. It allows you to deny search engines access to different files and folders, but often that’s not the best way to optimize your site. Here, we’ll explain how we think site owners should use their robots.txt file and propose a ‘best practice’ approach suitable for most websites.
You’ll find a robots.txt example that works for most WordPress websites further down this page. If you want to know more about how your robots.txt file works, you can read our ultimate guide to robots.txt.
What does a “best practice” look like?
Search engines continually improve how they crawl the web and index content. That means what used to be best practice a few years ago might not work anymore or may even harm your site.
Today, best practice means relying on your robots.txt file as little as possible. It’s only really necessary to block URLs in your robots.txt file when you have complex technical challenges (e.g., a large eCommerce website with faceted navigation) or no other option.
Blocking URLs via robots.txt is a ‘brute force’ approach and can cause more problems than it solves.
For most WordPress sites, the following example is best practice:
User-Agent: * Disallow: Sitemap: https://www.example.com/sitemap_index.xml
We even use this approach in our robots.txt file — although, sometimes, you will notice that we are testing some stuff.
What does this code do?
User-agent: *instruction states that any following instructions apply to all crawlers.
Disallow:directive comes without further instructions, so we’re saying, “all crawlers can freely crawl this site without restrictions.”
- In the robots.txt file, we also link to the location of the XML sitemap, making it easier for Google, Bing, and other search engines to find it.
- We also provide some information for humans looking at the file (linking to this very page) so that they understand why we set up the file the way that we did.
If you have to disallow URLs
If you want to prevent search engines from crawling or indexing certain parts of your WordPress site, it’s almost always better to do so by adding meta robots tags or robots HTTP headers.
Our ultimate guide to meta robots tags explains how you can manage crawling and indexing ‘the right way,’ and our Yoast SEO plugin provides the tools to help you implement those tags on your pages.
If your site has crawling or indexing challenges that you can’t fix via meta robots tags or HTTP headers, or if you need to prevent crawler access for other reasons, you should read our ultimate guide to robots.txt.
Note that WordPress and Yoast SEO already automatically prevent indexing of some sensitive files and URLs, like your WordPress admin area (via an x-robots HTTP header).
Why is this a best practice for WordPress SEO?
Robots.txt creates dead ends
Search engines need to discover, crawl and index your pages before you can compete for visibility in the search results. If you’ve blocked specific URLs via robots.txt, search engines can no longer crawl through those pages to discover others. That might mean that key pages don’t get discovered.
Robots.txt denies links their value
One of the basic rules of SEO is that links from other pages can influence your performance. If a URL is blocked, not only won’t search engines crawl it, but they also might not distribute any ‘link value’ pointing to that URL or through that URL to other pages on the site.
Google fully renders your site
Previous best practice of blocking access to your
wp-includes directory and your plugins directory via
robots.txt is no longer valid, which is why we worked with WordPress to remove the default disallow rule for
wp-includes in version 4.0.
Linking to your XML sitemap helps discovery
The robots.txt standard supports adding a link to your XML sitemap(s) to the file. This helps search engines discover the location and contents of your site. In the case of Bing, it needs this link to verify your site — unless you added a link to the sitemap via their Webmaster Tools.
It might feel redundant because you should already add your sitemap to your Google Search Console and Bing Webmaster Tools accounts to access analytics and performance data. However, having that link in the robots.txt gives crawlers a foolproof way of discovering your sitemap.
Yoast SEO automatically adds a link to your XML sitemap if you haven’t generated a robots.txt file yet. If you already have a robots.txt file, you can add the rule
Sitemap: https://www.example.com/sitemap_index.xml to your file via the file editor in the Tools section of Yoast SEO. Keep in mind that you should add the full URL to your XML sitemap. Multiple sitemaps go on multiple lines and all need full URLs.
Assess your technical SEO fitness
Being mindful of your robots.txt file is an essential part of technical SEO. Curious how fit your site’s overall technical SEO is? We’ve created a technical SEO fitness quiz that helps you figure out what you need to work on!
Coming up next!
- Stay tuned Keep an eye on our site and social media to stay up-to-date. See where you can find us next »
- SEO webinar 20 December 2022 Our head of SEO, Jono Alderson, will keep you up-to-date about everything that happens in the world of SEO and WordPress. All Yoast SEO webinars »