Crawl directives

There are multiple ways to tell search engines how to behave on your site. These are called “crawl directives”. They allow you to:

  • tell a search engine to not crawl a page at all;
  • not to use a page in its index after it has crawled it;
  • whether to follow or not to follow links on that page;
  • a lot of “minor” directives.

We write a lot about these crawl directives as they are a very important weapon in an SEO’s arsenal. We try to keep these articles up to date as standards and best practices evolve.


Must read articles about Crawl directives


Crawl budget optimization

5 July 2016 by Joost de Valk » - 3 Comments

crawl budget

Google doesn’t always spider every page on a site instantly. In fact, sometimes it can take weeks. This might get in the way of your SEO efforts. Your newly optimized landing page might not get indexed. At that point, it becomes time to optimize your crawl budget. Crawl budget is the time Google has in a given »

Category: Technical SEO
Tag:

robots.txt: the ultimate guide

17 May 2016 by Joost de Valk » - 5 Comments

noindex a post with meta robots noindex

The robots.txt file is one of the primary ways of telling a search engine where it can and can’t go on your website. All major search engines support the basic functionality it offers. There are some extra rules that are used by a few search engines which can be useful too. This guide covers all »

Category: Technical SEO
Tag:

rel=canonical: the ultimate guide

10 May 2016 by Joost de Valk » - 38 Comments

canonical urls as a solution for duplicate content

A canonical URL allows you to tell search engines that certain similar URLs are actually one and the same. Sometimes you have products or content that is accessible under multiple URLs, or even on multiple websites. Using a canonical URL (an HTML link tag with attribute rel=canonical) these can exist without harming your rankings. What »

Categories: Content SEO, Technical SEO
Tags: ,

WordPress robots.txt example for great SEO

26 April 2016 by Joost de Valk » - 36 Comments

noindex a post with meta robots noindex

The robots.txt file is a very powerful file if you’re working on a site’s SEO. At the same time, it also has to be used with care. It allows you to deny search engines access to certain files and folders, but that’s very often not what you want to do. Over the years, especially Google changed »

Categories: Technical SEO, WordPress
Tags: , ,

hreflang: the ultimate guide

5 April 2016 by Joost de Valk » - 31 Comments

9_Use_Hreflang_FI (1)

hreflang is a technical solution for sites that have similar content in multiple languages. A site owner wants search engines to point people to the most “fitting” language. Say a user is Dutch, the page that ranks is English, but there’s also a Dutch version. You would want Google to show the Dutch page in »

Category: Technical SEO
Tags: , , ,

Google Panda 4, and blocking your CSS & JS

19 June 2014 by Joost de Valk » - 79 Comments

noindex a post with meta robots noindex

A month ago Google introduced its Panda 4.0 update. Over the last few weeks we’ve been able to “fix” a couple of sites that got hit in it. These sites both lost more than 50% of their search traffic in that update. When they returned, their previous position in the search results came back. Sounds too good to be »

Category: WordPress
Tags: , ,

Preventing your site from being indexed, the right way

17 December 2009 by Joost de Valk » - 36 Comments

It keeps amazing me that I keep seeing people use robots.txt files to prevent sites from being indexed and thus showing up in the search engines. You know why it keeps amazing me? Because robots.txt doesn’t actually do the latter, even though it does prevent your site from being indexed. Let’s go through some terms »

Category: Technical SEO
Tags: , ,

Playing with the X-Robots-Tag HTTP header

20 January 2008 by Joost de Valk » - 31 Comments

Ever since the announcement on the Google Blog and more recently Yahoo’s announcement that they’ve enhanced their support for it, I’ve been meaning to play with the X-Robots-Tag header. This HTTP header allows you to do what you’d normally do in a robots meta tag, in an HTTP header, which has some pretty cool appliances. »

Category: Technical SEO
Tags: , , ,

The ultimate guide to the meta robots tag

12 October 2007 by Joost de Valk » - 11 Comments

Robots_meta_FI