Contact Us
 
 

Welcome to ineedhits Blog

Welcome to the ineedhits Search Engine Marketing blog, where we share the latest search engine and online marketing news, releases, industry trends and great DIY tips and advice.



Monday, September 20, 2010

The Definitive Guide to Robots.txt for SEO

Posted by @ 9:39 pm
5
  •  

If you’re serious about your SEO, then use of a Robots.txt file should definitely be a part of your strategy.

Surprisingly, many website owners forget to maintain, let alone create a robots.txt file for their websites, so here’s a guide on what a robots.txt file is and how best to use it for SEO purposes.

What is a robots.txt file?

A robots.txt file provides restrictions to search engine robots (known as “bots”) that crawl the web. Robots are used to find content to index in the search engine’s database.

These bots are automated, and before they access any sections of a site, they check to see if a robots.txt file exists that prevents them from indexing certain pages.

The robots.txt file is a simple text file (no HTML), that must be placed in your root directory, for example:

http://www.yourdomain.com/robots.txt

Reasons for using a robots.txt file?

There are 3 primary reasons for using a robots.txt file on your website:

  1. Information you don’t want made public through search
    In situations where you have content on your website which you don’t want accessed via searches, the robots.txt will prevent search engines from including it in their index.
  2. Duplicate Content
    Often similar content is presented on a website under various URLs (e.g. the same blog post might appear under various categories). Duplicate content can incur penalties by search engines which is bad from an SEO point of view. The robots.txt file can help you control which version of the content the search engines include in their index
  3. Manage bandwidth usage
    Some website’s have limited bandwidth allowances (based on hosting packages). As robots use up bandwidth when indexing your site, in some instances – you might want to stop some user agents from indexing elements of your site to conserve bandwidth usage.

How to create a robots.txt file?

The robots.txt file is just a simple text file. To create your own robots.txt file, open a new document in a simple text editor (e.g. notepad).

The content of a robots.txt file consists of “records” which tell the specific search engine robots what to index and what not to access.

Each of these records consist of two fields – the user agent line (which specifies the robot to control) and one or more Disallow lines. Here’s an example:

User-agent: googlebot
Disallow: /admin/

This example record allows the “googlebot”, which is Google’s spider to access every page from a site except files from the “admin” directory. All files in the “admin” directory will be ignored.

If you want only specific pages not indexed, then you need to specify the exact file. For example:

User-agent: googlebot
Disallow: /admin/login.html

Should you want your entire site and all its content to be indexed, then simply leave the disallow line blank.
If you want all the search engine robots to have access to the same content, you can use a generic user agent record – which will control all of them in the same way.

User-agent: *
Disallow: /admin/
Disallow: /comments/

How to find which User Agents to control?

The first place to look for a list of the robots currently indexing your website is in your log files.

For SEO purposes, you’ll generally want all search engines indexing the same content, so using “User-agent: *” is the best strategy.

If you want to get specific with your user agents, you can find a comprehensive list at http://www.user-agents.org/

Common Robots.txt Mistakes to Avoid

If you don’t format your robots.txt file properly, some or all files of your Web site might not get indexed by search engines. To avoid this, do the following:

  1. There is no “Allow” command. Everything is allowed by default.
  2. Don’t add the “disallow” line above the user agent. Ensure the user agent always sits above the disallow commands.
  3. Don’t add more one file or directory to a disallow command.
  4. Maintain cases for file names and directories in your robots.txt file. Names on your server are case sensitive, so getting the cases wrong will cause issues.
  5. Avoid listing all files in a directory. If you want the whole directory ignored, just use a generic directory entry.

If you want to check your Robots.txt file is implemented correctly, visit your Google Webmaster Center. It allows you to check your robots.txt. Google will automatically and in real time retrieve the robots.txt from your website.

So if you haven’t created a robots.txt file yet for your website – get going.

And it’s always a good idea to check your robots.txt file regularly as websites are constantly changing – so make sure your current robots.txt file is up to date.



Rene LeMerle Rene is the marketing manager of ineedhits.com - a global search engine marketing company. He also leads the marketing for Gooruze.com - a web 2.0 style community for online and digital marketers. Rene has been in the industry since 1997 with much of that time spent helping businesses embrace the best of the internet and digital world.

View Rene LeMerle's profile






Discussion (5 - comments)

[...] The Definitive Guide to Robots.txt for SEO Sep 21st, 2010No Comments [...]

By The Definitive Guide to Robots.txt for SEO « Miami Beach Web Designer - September 20, 2010



[...] A robots.txt is a vital control for your website. It helps you restrict access, avoid duplicate content indexing and also minimizes bandwidth usage. For a detailed rundown on robot.txt usage – check out The Definitive Guide to Robots.txt for SEO [...]

By 9 Essential Website Optimization Controls…from Bing | ineedhits - November 30, 2011



[...] A robots.txt is a vital control for your website. It helps you restrict access, avoid duplicate content indexing and also minimizes bandwidth usage. For a detailed rundown on robot.txt usage – check out The Definitive Guide to Robots.txt for SEO [...]

By 9 Essential Website Optimization Controls…from Bing | World Top Template Forum - November 30, 2011



[...] A robots.txt is a vital control for your website. It helps you restrict access, avoid duplicate content indexing and also minimizes bandwidth usage. For a detailed rundown on robot.txt usage – check out The Definitive Guide to Robots.txt for SEO [...]

By 9 Essential Website Optimization Controls…from Bing « Miami Beach Web Designer - November 30, 2011



[...] A robots.txt is a vital control for your website. It helps you restrict access, avoid duplicate content indexing and also minimizes bandwidth usage. For a detailed rundown on robot.txt usage – check out The Definitive Guide to Robots.txt for SEO [...]

By 9 Essential Website Optimization Controls…from Bing | RM2 Project - November 30, 2011




Add Your Comments







SUBSCRIBE

Keep up to date with the latest from our blogs.

Subscribe to all blog posts

The Newsletter
BROWSE OUR POSTS




  • New Posts
  • Popular
  • Comments


Jobthread



More in Tips & Advice (178 of 480 articles)


Following on from my posts about website architecture and PageRank leakage, today I've decided to cover another important element of ...