What are crawl errors?

What are crawl errors?

What are crawl errors?

Crawl errors occur when a search engine tries to reach a page on your website but fails at it. Let’s shed some more light on crawling first. Crawling is the process where a search engine tries to visit every page of your website via a bot. A search engine bot finds a link to your website and starts to find all your public pages from there. The bot crawls the pages and indexes all the contents for use in Google, plus adds all the links on these pages to the pile of pages it still has to crawl. Your main goal as a website owner is to make sure the search engine bot can get to all pages on the site. Failing this process returns what we call crawl errors.

New to SEO? Learn the Basics of SEO in our Basic SEO course »

Basic SEO training Info

Your goal is to make sure that every link on your website leads to an actual page. That might be via a 301 redirect, but the page at the very end of that link should always return a 200 OK server response.

Google divides crawl errors into two groups:

  1. Site errors. You don’t want these, as they mean your entire site can’t be crawled.
  2. URL errors. You don’t want these either, but since they only relate to one specific URL per error, they are easier to maintain and fix.

Let’s elaborate on that.

Site errors

Site errors are all the crawl errors that prevent the search engine bot from accessing your website. That can have many reasons,  these being the most common:

  • DNS Errors. This means a search engine isn’t able to communicate with your server. It might be down, for instance, meaning your website can’t be visited. This is usually a temporary issue. Google will come back to your website later and crawl your site anyway. If you see notices of this in your Google Search Console at crawl errors, that probably means Google has tried a couple of times and still wasn’t able to.
  • Server errors. If your search console shows server errors, this means the bot wasn’t able to access your website. The request might have timed out. The search engine (f.i.) tried to visit your site, but it took so long to load that the server served an error message. Server errors also occur when there are flaws in your code that prevent a page from loading. It can also mean that your site has so many visitors that the server just couldn’t handle all the requests. A lot of these errors are returned as 5xx status codes, like the 500 and 503 status codes described here.
  • Robots failure. Before crawling, (f.i.) Googlebot tries to crawl your robots.txt file as well, just to see if there are any areas on your website you’d rather not have indexed. If that bot can’t reach the robots.txt file, Google will postpone the crawl until it can reach the robots.txt file. So always make sure it’s available.

That explains a tad bit about crawl errors related to your entire site. Now let’s see what kind of crawl errors might occur for specific pages.

URL errors

As mentioned, URL errors refer to crawl errors that occur when a search engine bot tries to crawl a specific page of your website. When we discuss URL errors, we tend to discuss crawl errors like (soft) 404 Not Found errors first. You should frequently check for these type of errors (useGoogle Search Console or Bing webmaster tools) and fix ’em. If the page/subject of that page indeed is gone never to return to your website, serve a 410 page. If you have similar content on another page, please use a 301 redirect instead. Make sure your sitemap and internal links are up to date as well, obviously.

We found that a lot of these URL errors are caused by internal links, by the way. So a lot of these errors are your fault. If you remove a page from your site at some point, adjust or remove any inbound links to it as well. These links have no use anymore. If that link remains the same, a bot will find it and follow it, only to find a dead end (404 Not found error). On your website. You need to do some maintenance now and then on your internal links!

Among these common errors might be an occasional DNS error or server error for that specific URL. Re-check that URL later and see if the error has vanished. Be sure to use fetch as Google and mark the error as fixed in Google Search Console if that is your main monitoring tool in this. Our plugin can help you with that.

Very specific URL errors

There are some URL errors that apply to certain sites only. That’s why I’d like to list these separately:

  • Mobile-specific URL errors. This refers to page-specific crawl errors that occur on a modern smartphone. If you have a responsive website, these are unlikely to surface. Perhaps just for that piece of Flash content you wanted to replace already. If you maintain a separate mobile subdomain like m.example.com, you might run into more errors. Thing along the lines of faulty redirects from your desktop site to that mobile site. You might even have blocked some of that mobile site with a line in your robots.txt.
  • Malware errors. If you encounter malware errors in your webmaster tools, this means Bing or Google has found malicious software on that URL. That might mean that software is found that is used, for instance, “to gather guarded information, or to disrupt their operation in general.”(Wikipedia). You need to investigate that page and remove the malware.
  • Google News errors. There are some specific Google News errors. There’s quite a list of these possible errors in Google’s documentation, so if your website is in Google News, you might get these crawl errors. They vary from the lack of a title to errors that tell you that your page doesn’t seem to contain a news article at all. Be sure to check for yourself if this applies to your site.

Fix your crawl errors

The bottom line in this article is definitely: if you encounter crawl errors, fix them. It should be part of your site’s maintenance schedule to check for crawl errors now and then. Besides that, if you have installed our premium plugin, you’ll have a convenient way in WordPress and/or TYPO3 to prevent crawl errors when for instance deleting a page. Be sure to check these features yourselves!

Read more: ‘Google Search Console: Crawl’ »


23 Responses to What are crawl errors?

  1. Hair
    By Hair on 23 April, 2018

    Great information! I will check now for errors in my website thanks to your post :)

  2. Shah
    By Shah on 23 April, 2018

    Does this type errors affect on SEO?
    Thanks.

    • Willemien Hallebeek
      By Willemien Hallebeek on 23 April, 2018

      Yes, crawl errors can negatively affect your SEO!

  3. Mens
    By Mens on 22 April, 2018

    True!
    Great post! Thank you so much for this important information i will share it with my friends for sure :)

  4. Margaret
    By Margaret on 17 April, 2018

    Thank you very much for an interesting and useful review!

  5. Ritu
    By Ritu on 15 April, 2018

    Collecting and fixing crawl errors comes under technical SEO. If we do it in a correct way, our pages will and can see improved site raking.

  6. SEO Audit
    By SEO Audit on 14 April, 2018

    A complete article about crawl errors. I was looking for this information.

  7. Cletus
    By Cletus on 13 April, 2018

    Please i always get crawled late and most of my post don’t make it to front page

  8. Sanjay Mishra
    By Sanjay Mishra on 13 April, 2018

    Thanks for your page! Your share the information it helped me alot!

  9. Yuen Mi
    By Yuen Mi on 12 April, 2018

    I’ve been thinking lately whether I should remove old articles or revamping it by giving it some SEO basics. This made me rethink about removing it (404 errors, internal linking etc.). Thanks a lot for the useful information!

  10. Firnandus
    By Firnandus on 12 April, 2018

    Yes, I had to work hard because of that error and repair independently on my website that can be generated due to changes the URL in the URL and other articles which are sometimes included in the code on my website.

  11. Mark
    By Mark on 12 April, 2018

    I get a lot of crawl errors linked from spam sites. Hundreds a month. This wasn’t mentioned in the article. I don’t know whether to ignore them and let them pile up, keep deleting them or maybe a disavow file? The sites often format the URL incorrectly thus creating the 404 error in GSC. Contacting a website from China or Russia and asking them to stop is a scary thought as well.

    • Michiel Heijmans
      By Michiel Heijmans on 13 April, 2018

      Hi Mark,

      That’s a bummer. I think reaching out to a specialized company like Linkdetox would be a good idea? You can keep on disavowing these links, but a more thorough approach might be better.

  12. Suresh Dubey
    By Suresh Dubey on 12 April, 2018

    I was looking for this information. Thanks

  13. Basit Ansari
    By Basit Ansari on 12 April, 2018

    Thank you so much for telling me importance of crawl error.

  14. Hossein
    By Hossein on 12 April, 2018

    Hi,
    I recommend to add a notification badge icon in yoast, whenever a 404 url is found.
    In that way we can fix these find out and fix this errors faster.
    I also opened an issue about this on github:
    https://github.com/Yoast/wordpress-seo/issues/9396
    That would be very nice if you add this feature.
    Thanks

    • Willemien Hallebeek
      By Willemien Hallebeek on 13 April, 2018

      Thanks Hossein, we’ll have a look into it!

  15. Bill Bennett
    By Bill Bennett on 12 April, 2018

    Is there a simple way to check for internal 404 errors before the Google crawl robot arrives?

    • Willemien Hallebeek
      By Willemien Hallebeek on 12 April, 2018

      Hi Bill, You can use Screaming Frog https://www.screamingfrog.co.uk/seo-spider/ to find your 404s if you don’t want to use Google Search Console. Not sure if you will be faster than Google though, that probably depends on how new your site is and how often it gets crawled.

  16. Local Sales Offer
    By Local Sales Offer on 11 April, 2018

    Thank you for the well summarized article on crawl errors


Check out our must read articles about SEO basics