Google today announced that they have released a beta program that will increase the amount of control that site owners and search engine marketers have over which pages are indexed by “googlebot”.
The experimental program (which according to Google Engineer Shiva Shivakumar will “either fail miserably or succeed beyond our wildest dreams“) is called Google Sitemaps and aims to optimize Google’s crawling activities. This is achieved by:
- The website’s owner (or search engine marketer) creating an XML file that describes the site’s content;
- Posting this file on the domain that is to be indexed (uploading it);
- Waiting for the googlebot to arrive, to read this file; and
- The googlebot will now crawl the site as described by the file.
The file contains such information as:
- Which pages are to be indexed;
- When those pages were last updated; and
- The importance of each page relative to each other page.
Webmasters have been able to control which pages are not included by using a robots.txt file, stipulating which pages are not to be indexed. This provides even further control by now detailing which pages are to be included.
Using an XML based file to improve the number of pages being indexed is not new in Search Engine technology. This process has been around since circa 2000 and is commonly referred to “Trusted Feed“. Yahoo! currently runs the largest Trusted Feed program which is called Yahoo! Search Submit Pro (which ineedhits is a provider).
The key difference between the two programs is that the Yahoo! XML file is far more comprehensive in the information it provides and it is provided directly to the search engine. The Google program still requires the domain to be indexed prior to the file being read and the optimization to be done on the page. With Yahoo! the optimization is done in the file and can be updated without making any site changes.
A future post will compare the two programs in more detail.
Personally, I think that this is a clever move by Google in an area that they have not had such a solid solution in the past. It is going to provide Google with guidance on how to obtain all the content that in the past has been inaccessible to the spider. It should grow the size of the index considerably, whilst improving the amount of “deep content” that is in the index. Deep content generally has more detail and provides far greater value to the users.
Is it paid inclusion? The simple answer is “no”, although the complex answer says “maybe”. This is a free service provided by Google and as with anything free, there are no guarantees that the file will be followed. Of course, this file will do no good at all if your site is not being found by the Google spider in the first place.