googlebot prevent your site being indexed

Preventing your site from being indexed, the right way

Preventing your site from being indexed, the right way

June 05th, 2017 – 9 Comments

We’ve said it in 2009, and we’ll say it again: it keeps amazing us that there are still people using just a robots.txt files to prevent indexing of their site in Google or Bing. As a result their site shows up in the search engines anyway. You know why it keeps amazing us? Because robots.txt doesn’t actually do the latter, even though it does prevents indexing of your site. Let me explain how this works in this post.

For more on robots.txt, please read robots.txt: the ultimate guide.

Become a technical SEO expert with our Technical SEO 1 training! »

Technical SEO 1 training Info

There is a difference between being indexed and being listed in Google

Before we explain things any further, we need to go over some terms here first:

  • Indexed / Indexing
    The process of downloading a site or a page’s content to the server of the search engine, thereby adding it to its “index”.
  • Ranking / Listing / Showing
    Showing a site in the search result pages (aka SERPs).

So, while the most common process goes from Indexing to Listing, a site doesn’t have to be indexed to be listed. If a link points to a page, domain or wherever, Google follows that link. If the robots.txt on that domain prevents indexing of that page by a search engine, it’ll still show the URL in the results if it can gather from other variables that it might be worth looking at. In the old days, that could have been DMOZ or the Yahoo directory, but I can imagine Google using, for instance, your My Business details these days, or the old data from these projects. There are more sites that summarize your website, right.

Now if the explanation above doesn’t make sense, have a look at this 2009 Matt Cutts video explanation:

If you have reasons to prevent indexing of your website, adding that request to the specific page you want to block like Matt is talking about, is still the right way to go. But you’ll need to inform Google about that meta robots tag.  So, if you want to effectively hide pages from the search engines you need them to index those pages. Even though that might seem contradictory. There are two ways of doing that.

Prevent listing of your page by adding a meta robots tag

The first option to prevent listing of your page is by using robots meta tags. We’ve got an ultimate guide on robots meta tags that’s more extensive, but it basically comes down to adding this tag to your page:

<meta name="robots" content="noindex,nofollow>

The issue with a tag like that is that you have to add it to each and every page.

Or by adding a X-Robots-Tag HTTP header

To make the process of adding the meta robots tag to every single page of your site a bit easier, the search engines came up with the X-Robots-Tag HTTP header. This allows you to specify an HTTP header called X-Robots-Tag and set the value as you would the meta robots tags value. The cool thing about this is that you can do it for an entire site. If your site is running on Apache, and mod_headers is enabled (it usually is), you could add the following single line to your .htaccess file:

Header set X-Robots-Tag "noindex, nofollow"

And this would have the effect that that entire site can be indexed. But would never be shown in the search results.

So, get rid of that robots.txt file with Disallow: / in it. Use the X-Robots-Tag or that meta robots tag instead!

Read more: ‘The ultimate guide to the meta robots tag’ »


9 Responses to Preventing your site from being indexed, the right way

  1. Javi
    By Javi on 8 June, 2017

    Thank you so much for this article.
    Since my robots.txt its only a lot of ‘Disallow’ lines I’ll need to erase every line code?

    Thank you

  2. Steve Horn
    By Steve Horn on 6 June, 2017

    I feel really dumb. How can “noindex” mean the site CAN be indexed?

    • Willemien Hallebeek
      By Willemien Hallebeek on 9 June, 2017

      Hi Steve, the terminology is quite confusing :-) it would be better to say the site can be indexed, but not shown in the search results.

  3. Entity Creative
    By Entity Creative on 6 June, 2017

    Thanks Christoph, I run into at least 4 sites per year that the client’s website robots.txt file is set to no index. The devil is in the details.

    Best,

    Britt

  4. Christoph Daum
    By Christoph Daum on 6 June, 2017

    I would even add one thing, when I develop a new website (or relaunch), I use the .htaccess to block google and nosy visitors.

    I mostly have a static IP so I add this to a Whitelist.

    This is my addition to the htaccess.

    AuthType Basic
    AuthName “Restricted Access”
    AuthUserFile /path/to/.htpasswd
    Require valid-user

    Order deny,allow
    Deny from all
    allow from 192.168.123.123

    Satisfy Any

    The .htpasswd user can be something very very basic like admin/password, as I just want to block search engines and nosy eyes.

    The disadvantage at first: you block everything, so it’s useless for a partial solution.
    The advantages:
    besides blocking search engines, you block nosy visitors, with the addition of the whitelisted IP addresses, you can share the link for all users behind that IP (inside your company for example), and you can add more IPs if needed.
    And one of the biggest advantages, you cannot forget to remove it when going live, as this will directly throw an error when checking from your mobile phone, or the first visitors will drop you a message “hey, it says enter a password, there is something wrong”. If you forget the robots.txt or other things alike, you are live without allowing google to enter the site.

    • Joost de Valk
      By Joost de Valk on 7 June, 2017

      We often use the members plugin for similar blocking, but yeah, I agree on the added benefits :)

      • Christoph Daum
        By Christoph Daum on 8 June, 2017

        In my former company a member plugin was used to block google, but without any further measurements, and without talking to me or to the SEO manager, in the end there were a lot of indexed websites with the login mask of the used member plugin.
        Resulting in some grey hair and a useless work to get rid these pages again for the SEO manager.

  5. priya
    By priya on 6 June, 2017

    This article presents clear idea designed for the new
    people of blogging, Thanks for sharing.

  6. farhan asif
    By farhan asif on 5 June, 2017

    Very Informative Article..Thank you so much..


Check out our must read articles about Analytics