What are crawl errors?

Michiel Heijmans 11 April 2018 23 Crawl directives, SEO basics

Crawl errors occur when a search engine tries to reach a page on your website but fails. Let’s shed some more light on crawling first. Crawling is the process where a search engine tries to visit every page of your website via a bot. A search engine bot finds a link to your website and starts to find all your public pages. The bot crawls the pages, indexes all the contents for use in Google, and adds all the links on these pages to the pages it still has to crawl. Your main goal as a website owner is to ensure the search engine bot can get to all pages on the site. Failing this process returns what we call crawl errors.

Your goal is to ensure that every link on your website leads to an actual page. That might be via a 301 redirect, but the page at the end of that link should always return a 200 OK server response.

Google divides crawl errors into two groups:

Site errors. You don’t want these, as they mean your entire site can’t be crawled.
URL errors. You don’t want these, but since they only relate to one specific URL per error, they are easier to maintain and fix.

Let’s elaborate on that.

Site errors

Site errors are all the crawl errors that prevent the search engine bot from accessing your website. That can have many reasons, these being the most common:

DNS Errors. This means a search engine isn’t able to communicate with your server. It might be down, for instance, meaning your website can’t be visited. This is usually a temporary issue. Google will come back to your website later and crawl your site anyway. If you see notices of this in your Google Search Console at crawl errors, that probably means Google has tried a couple of times and still wasn’t able to.
Server errors. The bot couldn’t access your website if your Search Console showed server errors. The request might have timed out. The search engine (f.i.) tried to visit your site, but it took so long to load that the server served an error message. Server errors also occur when there are flaws in your code that prevent a page from loading. It can also mean that your site has so many visitors that the server just couldn’t handle all the requests. Many of these errors are returned as 5xx status codes, like the 500 and 503 status codes.
Robots failure. Before crawling, (f.i.) Googlebot tries to crawl your robots.txt file as well, just to see if there are any areas on your website you’d rather not have indexed. If that bot can’t reach the robots.txt file, Google will postpone the crawl until it can reach the robots.txt file. So always make sure it’s available.

That explains a tad bit about crawl errors related to your entire site. Now let’s see what crawl errors might occur for specific pages.

URL errors

As mentioned, URL errors refer to crawl errors that occur when a search engine bot tries to crawl a specific page of your website. When we discuss URL errors, we tend to discuss crawl errors like (soft) 404 Not Found errors first. You should frequently check for these errors (use Google Search Console or Bing webmaster tools) and fix them. If the page/subject of that page is gone, never to return to your website, serve a 410 page. If you have similar content on another page, please use a 301 redirect instead. Make sure your sitemap and internal links are up to date as well.

We found that a lot of these URL errors are caused by internal links, by the way. So a lot of these errors are your fault. If you remove a page from your site at some point, adjust or remove any inbound links to it as well. These links have no use anymore. If that link remains the same, a bot will find and follow it, only to find a dead end (404 Not found error) on your website. You need to do some maintenance now and then on your internal links!

Another common URL error is the one with the words ‘submitted URL’ in the title. These errors appear as soon as Google detects inconsistent behavior. On the one hand, you submitted the URL for indexing, so you’re telling Google: “Yes, I want you to index this page.” On the other hand, something else is telling Google: “No, don’t index this page.” A possible reason could be that your robots.txt file blocks your page. Or that the page is marked ‘noindex’ by a meta tag or HTTP header. If you don’t fix the inconsistent message, Google will not index your URL.

Among these common errors might be an occasional DNS error or server error for that specific URL. Re-check that URL later and see if the error has vanished. Be sure to use fetch as Google and mark the error as fixed in Google Search Console if that is your primary monitoring tool.

Very specific URL errors

Some URL errors apply to certain sites only. That’s why I’d like to list these separately:

Mobile-specific URL errors. This refers to page-specific crawl errors that occur on a modern smartphone. If you have a responsive website, these are unlikely to surface. You might run into more errors if you maintain a separate mobile subdomain like m.example.com. Things along the lines of faulty redirects from your desktop site to that mobile site. You might even have blocked some of that mobile site with a line in your robots.txt.
Malware errors. If you encounter malware errors in your webmaster tools, this means Bing or Google has found malicious software on that URL. That might mean that software is found that is used, for instance, “to gather guarded information, or to disrupt their operation in general.”(Wikipedia). You need to investigate that page and remove the malware.
Google News errors. There are some specific Google News errors. There’s quite a list of these possible errors in Google’s documentation, so if your website is in Google News, you might get these crawl errors. They vary from the lack of a title to errors that tell you that your page doesn’t seem to contain a news article. Be sure to check for yourself if this applies to your site.

Fix your crawl errors

The bottom line in this article is definitely: if you encounter crawl errors, fix them. It should be part of your site’s maintenance schedule to check for crawl errors now and then.

Michiel Heijmans

Michiel was one of our very first employees and used to be a partner at Yoast. Kick start your site optimization with his articles!

Discussion (23)

Hair Apr. 23, 2018

Great information! I will check now for errors in my website thanks to your post :)
Shah Apr. 23, 2018

Does this type errors affect on SEO?
Thanks.
- Willemien Hallebeek Apr. 23, 2018
  
  Yes, crawl errors can negatively affect your SEO!
Mens Apr. 22, 2018

True!
Great post! Thank you so much for this important information i will share it with my friends for sure :)
Margaret Apr. 17, 2018

Thank you very much for an interesting and useful review!
Ritu Apr. 15, 2018

Collecting and fixing crawl errors comes under technical SEO. If we do it in a correct way, our pages will and can see improved site raking.
SEO Audit Apr. 14, 2018

A complete article about crawl errors. I was looking for this information.
Cletus Apr. 13, 2018

Please i always get crawled late and most of my post don’t make it to front page
- Willemien Hallebeek Apr. 17, 2018
  
  Hi Cletus, perhaps this post https://yoast.com/how-to-get-google-to-crawl-your-site-faster/ can help you fix the problem?
Sanjay Mishra Apr. 13, 2018

Thanks for your page! Your share the information it helped me alot!
Yuen Mi Apr. 12, 2018

I’ve been thinking lately whether I should remove old articles or revamping it by giving it some SEO basics. This made me rethink about removing it (404 errors, internal linking etc.). Thanks a lot for the useful information!
- Willemien Hallebeek Apr. 13, 2018
  
  Hi Yuen Mi, Perhaps this is also an interesting read for you: https://yoast.com/republish-old-content/
Firnandus Apr. 12, 2018

Yes, I had to work hard because of that error and repair independently on my website that can be generated due to changes the URL in the URL and other articles which are sometimes included in the code on my website.
- Willemien Hallebeek Apr. 13, 2018
  
  Hi Firnandus, It can be a lot of work indeed! But worth the effort. If you have the Premium version of Yoast SEO, we can help you prevent crawl errors when changing URLs: https://yoast.com/wordpress/plugins/seo/redirects-manager/
Mark Apr. 12, 2018

I get a lot of crawl errors linked from spam sites. Hundreds a month. This wasn’t mentioned in the article. I don’t know whether to ignore them and let them pile up, keep deleting them or maybe a disavow file? The sites often format the URL incorrectly thus creating the 404 error in GSC. Contacting a website from China or Russia and asking them to stop is a scary thought as well.
- Michiel Heijmans Apr. 13, 2018
  
  Hi Mark,
  
  That’s a bummer. I think reaching out to a specialized company like Linkdetox would be a good idea? You can keep on disavowing these links, but a more thorough approach might be better.
Suresh Dubey Apr. 12, 2018

I was looking for this information. Thanks
Basit Ansari Apr. 12, 2018

Thank you so much for telling me importance of crawl error.
Hossein Apr. 12, 2018

Hi,
I recommend to add a notification badge icon in yoast, whenever a 404 url is found.
In that way we can fix these find out and fix this errors faster.
I also opened an issue about this on github:
https://github.com/Yoast/wordpress-seo/issues/9396
That would be very nice if you add this feature.
Thanks
- Willemien Hallebeek Apr. 13, 2018
  
  Thanks Hossein, we’ll have a look into it!
Bill Bennett Apr. 12, 2018

Is there a simple way to check for internal 404 errors before the Google crawl robot arrives?
- Willemien Hallebeek Apr. 12, 2018
  
  Hi Bill, You can use Screaming Frog https://www.screamingfrog.co.uk/seo-spider/ to find your 404s if you don’t want to use Google Search Console. Not sure if you will be faster than Google though, that probably depends on how new your site is and how often it gets crawled.
Local Sales Offer Apr. 11, 2018

Thank you for the well summarized article on crawl errors