Welcome to the ineedhits Search Engine Marketing blog, where we share the latest search engine and online marketing news, releases, industry trends and great DIY tips and advice.
What is a search engine spider?
A search engine spider, also called crawler or bot, is a program designed to browse the Internet in a systematic, automated manner and retrieve information about websites.
Search engine spiders extract information from the pages they visit and store them in a way that allows search engines to process and index the data and quickly retrieve relevant parts of the data in response to search queries.
How do spiders work?
Search engine spiders use the hyperlinks contained on web page to move from one website to the next – or crawl from one web page to another if you prefer. If a spider is given a list of URLs to visit, it begins visiting the URLs on the list, identifies the hyperlinks on those URLs and adds the hyperlinked pages to the list of URLs to visit, thus expanding the so-called “crawl frontier”. The hyperlinked pages could belong to the same website or be links to external pages.
With more and more web pages being added to the World Wide Web, and existing web pages being changed and updated frequently, one of the main challenges for search engine spiders is to efficiently crawl as many new and updated web pages as possible. Because of this challenge, search engine spiders use a set of rules that help them determine which pages to crawl how often and how to distribute the activities of multiple spiders that are crawling the web at the same time.
Search engines generally have multiple copies of their spiders crawling the web at the same time, and do not provide much information about exactly which rules their spiders follow to avoid search engine spammers using this information to manipulate crawls (and ultimately search engine rankings). In general, search engine spiders are guided by some measure of website importance as expressed in the quality and popularity of a site. They can also “learn” how often pages are updated and when is a good time to spider the page again for new content. This is both good for the search engine and the webmaster, as it uses less bandwidth.
What do spiders read?
What do spiders ignore?
Can spiders do any damage?
How to make your site spider friendly
Try and provide search engine spiders with an easy way to navigate through your site (e.g. through a sitemap or through HTML links) and provide them with plenty of HTML copy to index.
And, of course, make sure search engine spiders find your site in the first place – that means you need incoming links! Directory listings are a good source of incoming links, and you should also request links to your site from relevant, related websites (e.g. supplier, industry association or customer websites).
Important search engine spiders
Spiders have names, just like browsers do. All good web statistics programs should give you a report on spiders that have crawled your site. Alternatively, you can check your log files. Here are the names of the most important search engine spiders (thanks to http://www.jafsoft.com/searchengines/webbots.html – check out the full list on their site):
Google’s Spider: Googlebot
Yahoo!’s Spider: Slurp (the Inktomi spider)
MSN’s Spider: MSNBOT
Ask’s Spider: teoma_agent1
Abacho’s Spider: AbachoBOT
Aesop’s Spider: AESOP_com_SpiderMan
Alexa’s Spider: ia_archiver
AltaVista’s Spider Scooter or Mercator
AllTheWeb’s Spider: FAST-WebCrawler
Baidu’s Spider: Baiduspider
Entireweb’s Spider: Speedy Spider
Excite’s Spider: ArchitextSpider
Infoseek’s Spider: UltraSeek or InfoSeek Sidewinder
Looksmart’s Spider: MantraAgent
Lycos’ Spider: Lycos_Spider_(T-Rex)
Mirago’s Spider: HenryTheMiragoRobot
ScrubTheWeb’s Spider: Scrubby/
Singingfish’s Spider: asterias
WiseNut’s Spider: ZyBorg
If you have any specific questions aobut search engine spiders, let us know and we’ll do our best to provide you with answers!
Top 10 Listing! Get more targeted visitors to your site! No click fees! Find Out More Here | 1000+ Guaranteed Visitors Get thousands of guaranteed website visitors to your site in 30 days! Get Started Here |
I am buidling a new site in .net Should I rebuild in HTML??
By Anonymous - July 10, 2006
Hi there,
.net is fine (we use it ourselves – see the .aspx endings of our URLs!). To be on the safe side, the important thing is that you create static pages with HTML content (look at our homepage for example – it’s written in .net but the output is a static page, it’s not database-driven). And as long as there aren’t too many database parameters, search engines should also be able to crawl dynamic pages.
By Nancy Hackett - July 11, 2006
Do spiders read html comments?
By Gorka - October 2, 2006
Is this statement true: “If a new web site shows up, the spikders will appear every three days looking for more. if they find nothing new, they come back every 10 days. If the site is static, they stop coming.”
By Anonymous - October 10, 2008
[...] Website Submission Page Entire Web I Need Hits Good FAQ about SEO terms Wordtracker This entry was written by admin, posted on June 9, 2009 at 10:58 am, filed under [...]
By Seo Reaserch Tools - - June 9, 2009