HTTP 503: Handling site maintenance correctly for SEO

Last week I got a few messages from Google Webmaster Tools, saying it couldn’t access the robots.txt file on a site of a client. Turns out the client didn’t handle scheduled downtime correctly, causing problems with Google. While this article covers some rather basic technical SEO the last bit might be interesting for more advanced users. The message from Google Webmaster Tools read like this:

Over the last 24 hours, Googlebot encountered 41 errors while attempting to access your robots.txt. To ensure that we didn’t crawl any pages listed in that file, we postponed our crawl. Your site’s overall robots.txt error rate is 7.0%

HTTP status codes and search engines

A search engine constantly verifies whether content it’s linking to stille exist and hasn’t changed.  It verifies two things:

  1. is the content still being served with the correct HTTP status code (HTTP 200);
  2. is it still the same content.

An HTTP 200 status code means: all is well, here is the content you asked for. It is the only correct status code for content. If content has moved, you can redirect it, either permanently, with an HTTP 301 header, or temporarily, with an HTTP 302 or 307 header.

If your server gives any other HTTP status header, it means the search engine can no longer find the content. If you server gives a 200 HTTP status code, but the page is in fact an error and says something like “File not found” or has very little content, Google will classify it as a soft 404 in Google Webmaster Tools.

There is only one proper way of telling the search engine that you’re doing site maintenance:

How server downtime works for search engines

If, during a crawl, a search engine finds that some content no longer exists, ie. it gives a 404 HTTP status, it will usually remove that content from the search results until it can come back and verify that it’s there again. If this happens often, it’ll take longer and longer for the content to come back in the search results.

What you should be doing is giving a 503 HTTP status code. This is the definition of the 503 status code from the RFC that defines these status codes:

The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.

So, you have to send a 503 status code in combination with a Retry-After header. Basically you’re saying: hang on, we’re doing some maintenance, please come back in X minutes. That sounds a lot better than what a 404 error says: “Not Found”. A 404 literally means that the server can’t find anything to return for the URL that was given.

How do I send a 503 header?

In PHP the code for a 503 would be like this:

$protocol = "HTTP/1.0";
if ( "HTTP/1.1" == $_SERVER["SERVER_PROTOCOL"] )
  $protocol = "HTTP/1.1";
header( "$protocol 503 Service Unavailable", true, 503 );
header( "Retry-After: 3600" );

The delay time, 3600 in the above example, is given in seconds, so 3600 corresponds to 60 minutes. You can also specify the exact time when the visitor should come back, by sending a GMT date instead of the number of seconds. This would result in something like this:

header( "Retry-After: Fri, 19 Mar 2013 12:00:00 GMT" );

Use that with caution though, setting it to a wrong date might give unexpected results!

Our site is never down, we’re on WordPress

Nonsense. Every time you upgrade your core WordPress install, or when you’re updating plugins, WordPress will give a maintenance page. The default page sends out a proper 503 header. You can replace the default error page with a maintenance.php file in your wp-content folder, but if you do, you have to make sure that file sends out the proper 503 headers too. You can copy the code from the wp_maintenance() function.

If your database is down, WordPress actually sends an internal server error, using the dead_db() function. If you’re doing planned maintenance on your database, therefore, you’ll need to set up a custom database error message page, db-error.php in your wp-content folder that sends a proper 503 header.

Beware caches!

So where did our client go wrong?

Funnily enough, our client had properly configured 503 headers on their server. There was an issue though: they use a Varnish cache and that Varnish didn’t transfer the 503 status code correctly, it replaced it with a “general” HTTP 500 status, causing Google to send out that error email. I haven’t had a chance to test whether that is default Varnish behavior or something they broke, but it’s worth testing for your environment.

Pro tip: sending a 503 for your robots.txt

Per this post from Pierre Far of Google, if you send an HTTP 503 status code for your robots.txt, Google will halt all the crawling on your domain until it’s allowed to crawl the robots.txt again. This is actually a very useful way of preventing load on your server when doing maintenance. It still requires you to send a 503 for every URL on your server, including all static ones, but after Google has re-fetched the robots.txt it’ll probably stop hammering your server(s) for a while.

Conclusion: know what HTTP headers you’re sending

While writing this article I was reminded about a tweet quoting Vanessa Fox during last weeks SMX West:

I couldn’t agree more and would add to that: at all times. Now go check those headers!

Tags: ,


Yoast.com runs on the Genesis Framework

Genesis theme frameworkThe Genesis Framework empowers you to quickly and easily build incredible websites with WordPress. Whether you're a novice or advanced developer, Genesis provides you with the secure and search-engine-optimized foundation that takes WordPress to places you never thought it could go.

Read our Genesis review or get Genesis now!

14 Responses

  1. YousafBy Yousaf on 18 March, 2013

    No one does this and it annoys me!

    I emailed the development company of one of my clients and they didn’t even know what it was.

  2. ZimbrulBy Zimbrul on 18 March, 2013

    I haven t done that and my site is now de indexed by Google

    • vicky sadhuBy vicky sadhu on 11 April, 2013

      Same happens with me too..:( What to do now? :(

  3. Toni AnicicBy Toni Anicic on 18 March, 2013

    Thanks for the Varnish part, it might explain one thing I saw a few months ago :)

  4. Mary Kay LofurnoBy Mary Kay Lofurno on 19 March, 2013

    Very useful post, definitely one for my delicious account.

    Thanks, Mary Kay

  5. ChrisBy Chris on 22 March, 2013

    It’s not just due to site maintenance it seems. I run a few different WordPress based sites aside from my main site. One of these sites gets a fair amount of traffic as well as crawlers and other bots. I actually took a look at Wikipedia’s Robots.txt to see how they deal with these gathering bots, http://en.wikipedia.org/robots.txt
    What I noticed, was simply making changes to your Robots.txt that google has previously used in webmaster tools, will cause google not to update the new robots.txt for a period of up to two weeks. (Crawls put on hold during that time) I don’t have any hard evidence but I haven’t changed any content on that site since it launched, all I did was update the robots.txt and saw the exact same error that you mention.

    Chris

  6. Tim OsbornBy Tim Osborn on 22 March, 2013

    Great tip, thanks! my maintenance redirect has always sent a 302.. Is the following a reasonable adaptation?

    I set htaccess to redirect any request not from a defined IP address to maintenance.php with a 302, then send your 503 headers using php..

    This way, i can still work on the site unimpeded, whilst everyone else gets a polite page, but do those redirects in succession make sense to google?

    • Tim OsbornBy Tim Osborn on 22 March, 2013

      Ah! I see code is welcome:

      # http://perishablepress.com/press/2010/05/19/htaccess-redirect-maintenance-page-site-updates/
      <IfModule mod_rewrite.c>
       RewriteEngine on
       
       # let this (iescaped) IP address see the real site:
       RewriteCond %{REMOTE_ADDR} !^123.45.67.89
      
       RewriteCond %{REQUEST_URI} !/maintenance.php$ [NC]
       RewriteCond %{REQUEST_URI} !.(jpe?g?|png|gif|css|js) [NC]
       RewriteRule .* /maintenance.php [R=302,L]
      </IfModule>
      
      • Tim OsbornBy Tim Osborn on 22 March, 2013

        :( sorry.. the commented address was linkified, and the IP example unescaped.. you might like to remove these, Joost, sorry!

  7. ChrisBy Chris on 22 March, 2013

    For the developer types, here is another way to handle this as well.
    http://wptip.me/wordpress-maintenance-mode-without-a-plugin

  8. JaipurwebsBy Jaipurwebs on 30 March, 2013

    yes this is really nice post fro handling HTTP 503 mode of website. Thanks for sharing.

  9. nucleoseoBy nucleoseo on 1 April, 2013

    Hi Joost,

    Do you know how can I apply this code to a Magento E-commerce? I tried your advices but still not working. I verify my HTTP request here: web-sniffer.net

  10. JamesBy James on 1 April, 2013

    An article full of really good advice.

  11. Mathew PorterBy Mathew Porter on 16 April, 2013

    This was a life saver when I was moving servers to my new vps which is so much better than the shared reseller i was on. There were lots of issues when moving, so there was a lot of downtime. Anyway moral of the story, do things right for damage limitation.