Crawl errors occur when a search engine tries to reach a page on your website but fails at it. Let’s shed some more light on crawling first. Crawling is the process where a search engine tries to visit every page of your website via a bot. A search engine bot finds a link to your website and starts to find all your public pages from there. The bot crawls the pages and indexes all the contents for use in Google, plus adds all the links on these pages to the pile of pages it still has to crawl. Your main goal as a website owner is to make sure the search engine bot can get to all pages on the site. Failing this process returns what we call crawl errors.
Your goal is to make sure that every link on your website leads to an actual page. That might be via a 301 redirect, but the page at the very end of that link should always return a 200 OK server response.
Google divides crawl errors into two groups:
- Site errors. You don’t want these, as they mean your entire site can’t be crawled.
- URL errors. You don’t want these either, but since they only relate to one specific URL per error, they are easier to maintain and fix.
Let’s elaborate on that.
Site errors are all the crawl errors that prevent the search engine bot from accessing your website. That can have many reasons, these being the most common:
- DNS Errors. This means a search engine isn’t able to communicate with your server. It might be down, for instance, meaning your website can’t be visited. This is usually a temporary issue. Google will come back to your website later and crawl your site anyway. If you see notices of this in your Google Search Console at crawl errors, that probably means Google has tried a couple of times and still wasn’t able to.
- Server errors. If your search console shows server errors, this means the bot wasn’t able to access your website. The request might have timed out. The search engine (f.i.) tried to visit your site, but it took so long to load that the server served an error message. Server errors also occur when there are flaws in your code that prevent a page from loading. It can also mean that your site has so many visitors that the server just couldn’t handle all the requests. A lot of these errors are returned as 5xx status codes, like the 500 and 503 status codes described here.
- Robots failure. Before crawling, (f.i.) Googlebot tries to crawl your robots.txt file as well, just to see if there are any areas on your website you’d rather not have indexed. If that bot can’t reach the robots.txt file, Google will postpone the crawl until it can reach the robots.txt file. So always make sure it’s available.
That explains a tad bit about crawl errors related to your entire site. Now let’s see what kind of crawl errors might occur for specific pages.
As mentioned, URL errors refer to crawl errors that occur when a search engine bot tries to crawl a specific page of your website. When we discuss URL errors, we tend to discuss crawl errors like (soft) 404 Not Found errors first. You should frequently check for these type of errors (use Google Search Console or Bing webmaster tools) and fix ’em. If the page/subject of that page indeed is gone never to return to your website, serve a 410 page. If you have similar content on another page, please use a 301 redirect instead. Make sure your sitemap and internal links are up to date as well, obviously.
We found that a lot of these URL errors are caused by internal links, by the way. So a lot of these errors are your fault. If you remove a page from your site at some point, adjust or remove any inbound links to it as well. These links have no use anymore. If that link remains the same, a bot will find it and follow it, only to find a dead end (404 Not found error). On your website. You need to do some maintenance now and then on your internal links!
Among these common errors might be an occasional DNS error or server error for that specific URL. Re-check that URL later and see if the error has vanished. Be sure to use fetch as Google and mark the error as fixed in Google Search Console if that is your main monitoring tool in this. Our plugin can help you with that.
Very specific URL errors
There are some URL errors that apply to certain sites only. That’s why I’d like to list these separately:
- Mobile-specific URL errors. This refers to page-specific crawl errors that occur on a modern smartphone. If you have a responsive website, these are unlikely to surface. Perhaps just for that piece of Flash content you wanted to replace already. If you maintain a separate mobile subdomain like m.example.com, you might run into more errors. Thing along the lines of faulty redirects from your desktop site to that mobile site. You might even have blocked some of that mobile site with a line in your robots.txt.
- Malware errors. If you encounter malware errors in your webmaster tools, this means Bing or Google has found malicious software on that URL. That might mean that software is found that is used, for instance, “to gather guarded information, or to disrupt their operation in general.”(Wikipedia). You need to investigate that page and remove the malware.
- Google News errors. There are some specific Google News errors. There’s quite a list of these possible errors in Google’s documentation, so if your website is in Google News, you might get these crawl errors. They vary from the lack of a title to errors that tell you that your page doesn’t seem to contain a news article at all. Be sure to check for yourself if this applies to your site.
Fix your crawl errors
The bottom line in this article is definitely: if you encounter crawl errors, fix them. It should be part of your site’s maintenance schedule to check for crawl errors now and then. Besides that, if you have installed our premium plugin, you’ll have a convenient way in WordPress and/or TYPO3 to prevent crawl errors when for instance deleting a page. Be sure to check these features yourselves!
Read more: Google Search Console: Crawl »