Just recently Google released secret index statistics that you can access via Google’s webmaster tools. These are an interesting development that may seem too good to be true, but it appears that it is real and not smoke and mirrors. You will now be able to view the number of pages for your site that were indexed during the past year.
Total Indexed Count
According to Google, this count is precise and accurate. If your site has duplicate URLS with canonical attributes or if Google has clustered any duplicate URLS together, the count will not include the duplicates at all. This data is charted over time as far back as a year so you can see how your pages were indexed over the last year. There is a lag time to consider, with new results not appearing for a couple weeks, so this tool is better used to study trends over time than to assess real-time indexing.
Advanced Option for Additional Information
You can learn even more about how Google has ranked your website by viewing advanced options. You can use these advanced statistics in some innovative ways if you know what you’re doing. For example, if you add up the following: (1) total indexed, (2) not selected, and (3) blocked by robots, you will come up with a sum that will tell you the precise number of URLS that Google is looking at. However, the list of blocked URLS is available through the API and not in the UI.
According to Google, there are various reasons why a URL may not be chosen for indexing. For example, If it is “canonical” to a different page, redirects to a different page, or discerns that the content on one page is similar enough to the content at another URL and has selected the other URL as the legitimate one. Blank pages or duplicate pages, and pages that redirect the user to another page are also excluded from indexing.
The ‘Ever Crawled’ Number
This number reflects the number of URLS that Google has ever crawled. For one site, Google may have crawled through over 2 million URLS, but Google may only be currently considering a small fraction of that number. So where do the larger figures come from? What happens to the other pages? Keep in mind that these figures pertain only to HTML files, and that this count is ‘ever’ crawled not ‘currently’ crawled, meaning that it probably contains its fair share of 404s.
This figure may seem pretty useful at first glance, but it may be harder to use than you think. If you find that your ‘ever crawled’ figure is much smaller than the number of pages on your website, that indicates that you need to do some work in order to gain more visibility and to be ranked better at Google. But if the figure for ‘ever crawled’ is much higher than the size of your website, these figures may not give you much to work with.