At the beginning of the week Google rolled out some pretty sweeping changes to the GWT Crawl Errors. Changes to both the interface design, and the way and amount of data that is recorded have resulted in a tidier, more efficient means of tracking problems on your site(s). There is even the added dimension of seeing crawl error types displayed in various languages.
To begin with, we can now see trends over time via the graph. Being able to get a sense of the progress being made, or lack thereof, is great. It's an easy way to understand, at a glance, the general state of affairs on your site over the past 3 months or so.
The overarching goal in this round of updates from Google seems to be the simplification and organisation of data. It is with this in mind that they have divided errors into two main categories:
1. Site Errors – These are errors which affect your entire site. They are things like server issues, DNS resolution failures, and the inability to fetch the robots.txt file.
2. URL Errors – Not found (404) errors, soft 404s, not followed, access denied etc.
When a crawl error type is selected, we are then presented with the familiar list of pages with errors below the graph, but with significant changes. This section has been at the heart of most of the criticism aimed at Google since the updates. Where it was previously possible to access up to 100,000 URLs with each type of error, the number has now been limited to 1,000.
The reason, according to Google:
Trying to consume all this information was like drinking from a fire-hose, and you had no way of knowing which of those errors were important (your homepage is down) or less important (someone’s personal site made a typo in a link to your site).There was no realistic way to view all 100,000 errors—no way to sort, search, or mark your progress.
The reason why Google is wrong, according to Vanessa Fox:
There were absolutely realistic ways to view, sort, search, and mark your progress. The CSV download made all of this easy using Excel. And more data is always better to see patterns, especially for large scale sites with multiple servers, content management systems, and page templates.
Before we pass judgement on who's right or wrong here, it's important to note two other new aspects of the URL list.
1. Priority Listings – Google has helped us out by ranking the URL errors in order of importance and/or where there is a real possibility of the user being able to resolve the issue. Based on a multitude of factors, including sitemap listing (is it in there?), backlink profile (how many links are pointing to the page?), and page popularity (does it attract a relatively high level of traffic?).
2. Fixed Marker – If you believe that an error has been resolved, you can now mark it off using the Fixed checkbox on the left. The error will be removed from the list, but will return if Google finds the issue has re-occurred.
While it may be preferential to have access to 100,000 URLs of each error type, this way of working seems far more organised and focused. As you make your way through the top 1,000, the next batch will appear and so forth. It ensures that the most significant errors are targeted first.
Furthermore, the new set-up lends itself to those times when the errors have to be relayed to a client or third party agency to get fixed; if a client receives 100,000 errors for redirection, not a lot will get done. Drip-feeding those errors through in smaller, targeted batches is a far more palatable and manageable system.
Other improvements made include the reworking of the way we are presented with error details. Clicking on an error page URL opens a “detail pane” which shows further information:
Pages linked from
When the issue was first detected
Mark as fixed
Fetch as Googlebot
The latter of the two options is particularly helpful, allowing you to check that the URL error is fixed before marking it as such. The detail pane as a whole is another example of the new sense of coherence throughout the crawl error section. Within this box, you can access the pages which link to the problem URL, check the error has been resolved using the Fetch as Googlebot button, before marking it off as fixed.
An option which is no longer available is the ability to download crawl source errors. This is a slight loss, as more often than not there would be a host of errors emanating from the same source for a variety of error page URLs. Being able to download and filter by source error enabled us to see which pages on our own site were harbouring any rogue links, or which external sites included miss-spelled links to our site.
In the latter case though, the resolution is most often a 301 redirect; rather than contacting various sites, it is far easier to redirect the broken link to the correct URL on your site. With this in mind, the loss of the ability to download screeds of error source data is not a huge loss. For the majority of cases, having the linked from data within the detail pane is sufficient.
All in all, I'm quite a fan of the latest updates to the crawl errors section of Google Webmaster Tools. I even like the educational aspect of the ever-changing language within the crawl error section on the dashboard. Where else could you learn that “nicht gefunden” is German for “not found”? Alas, the dynamic language rotator's days are numbered according to John Mueller, Google Webmaster Trends Analyst, who replied to growing numbers of concerned users with this cheeky little snippet:
Hi everyone, we're aware of the language mix-up regarding the labels there, sorry for the confusion! We hope to have that resolved shortly. In other news, this is your chance to learn what various crawl errors are called in other languages.
Here are a few to get you started.
German – Nicht Gefunden (Not Found)
Turkish – Sunucu hatası (Server Error)
Russian - Ошибка 404 (Soft 404)
Dutch - Niet gevolgd (Not Followed)
You learn something new every day.