What’s the X-Robots-Tag HTTP header? And how to use it?

Traditionally, you will use a robots.txt file on your server to manage what pages, folders, subdomains, or other content search engines will be allowed to crawl. But did you know there’s also such a thing as the X-Robots-Tag HTTP header? Here, we’ll discuss the possibilities and how this might be a better option for your blog.

Quick recap: robots.txt

Before we continue, let’s look at what a robots.txt file does. In a nutshell, it tells search engines not to crawl a particular page, file, or directory of your website. Using this helps both you and search engines such as Google. By not providing access to specific, unimportant areas of your website, you can save on your crawl budget and reduce the load on your server.

Please note that using the robots.txt file to hide your entire website from search engines is not recommended. Our robots.txt ultimate guide has everything you need about this topic.

Say hello to X-Robots-Tag

In 2007, Google added support for the X-Robots-Tag directive. This means that you can not only restrict access for search engines via a robots.txt file but also programmatically set various robot.txt-related directives in the headers of an HTTP response. You might be thinking, “But can’t I just use the robots meta tag instead?”. The answer is yes. And no.

If you plan on programmatically blocking a particular page that is written in HTML, then using the meta tag should suffice. But if you plan on blocking Googlebot from crawling an image, for instance, you could use the HTTP response approach to do this in code. You can always use the latter method if you don’t feel like adding additional HTML to your website.

Here’s an example of an HTTP response with an X-Robots-Tag instructing crawlers not to index a page and not to follow links on that page:

HTTP/1.1 200 OK
Date: Thu, 25 November 2021 20:12:23 GMT
(…)
X-Robots-Tag: noindex, nofollow
(…)

X-Robots-Tag directives

There are two different kinds of directives: crawler directives and indexer directives. We’ll briefly explain the difference below.

Crawler directives

The robots.txt file only contains the so-called ‘crawler directives’, which tell search engines where they are or aren’t allowed to go. By using this directive, you can specify where search engines are permitted to crawl:

Allow

This directive does the exact opposite:

Disallow

Additionally, you can use the following directive to help search engines out and crawl your website even faster:

Sitemap

Note that it’s also possible to fine-tune the directives for a specific search engine by using the following directive in combination with the other directives:

User-agent

Keep in mind that pages can still show up in search results in case enough links are pointing to them, despite explicitly defining these with the following directive:

Disallow

This means that if you want to really hide something from the search engines, and thus from people using search, robots.txt won’t suffice.

Indexer directives

Indexer directives are directives set on a per-page and/or per-element basis. Until July 2007, there were two directives: the microformat rel="nofollow", meaning that that link should not pass authority / PageRank and the Meta Robots tag.

With the Meta Robots tag, you can really prevent search engines from showing pages you want to keep out of the search results. The same result can be achieved with the X-Robots-Tag HTTP header. As described earlier, the X-Robots-Tag gives you more flexibility by allowing you to control how specific file(types) are indexed. More on this topic in our meta robots ultimate guide.

Examples of the X-Robots-Tag in use

The theory is nice, but let’s see how you could use the X-Robots-Tag in the wild! If you want to prevent search engines from showing files you’ve generated with PHP, you could add the following in the head of the header.php file:

header("X-Robots-Tag: noindex", true);

This would not prevent search engines from following the links on those pages. If you want to do that, then alter the previous example as follows:

header("X-Robots-Tag: noindex, nofollow", true);

Although this method in PHP has its benefits, you’ll likely want to block specific file types altogether. The more practical approach would be to add the X-Robots-Tag to your Apache server configuration or a .htaccess file. Imagine you run a website with some .doc files, but you don’t want search engines to index that file type for a particular reason. On Apache servers, you should add the following line to the configuration / a .htaccess file:

<FilesMatch ".doc$">
Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</FilesMatch>

Or, if you’d want to do this for both .doc and .pdf files:

<FilesMatch ".(doc|pdf)$">
Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</FilesMatch>

If you’re running Nginx instead of Apache, you can get a similar result by adding the following to the server configuration:

location ~* \.(doc|pdf)$ {
    add_header  X-Robots-Tag "noindex, noarchive, nosnippet";
}

There are cases in which the robots.txt file itself might show up in search results. By using an alteration of the previous method, you can prevent this from happening to your website:

<FilesMatch "robots.txt">
Header set X-Robots-Tag "noindex"
</FilesMatch>

And in Nginx:

location = robots.txt {
    add_header  X-Robots-Tag "noindex";
}

Conclusion

As you can see, based on the examples above, the X-Robots-Tag HTTP header is a potent tool. Use it wisely and cautiously, as you won’t be the first to block your entire site by accident. Nevertheless, it’s a great addition to your toolset if you know how to use it.

Read more: Meta robots tag: the ultimate guide »

Coming up next!