How To Use Robots.Txt File?

Robots.txt file usage is sometimes ignored. On the other hand, it is an important factor for the webpages being indexed properly and very easy to setup.

I know that robots.txt is not something new. But, I’ve been preparing a SEO sheet for a while and wanted to share this small & useful portion with you.

What is robots.txt?

Robots.txt is a file that is used to exclude content from the crawling process of search engine spiders / bots. Robots.txt is also called the Robots Exclusion Protocol.

Why to use robots.txt?

In general, we prefer that our webpages are indexed by the search engines. But there may be some content that we don’t want to be crawled & indexed. Like the personal images folder, website administration folder, customer’s test folder of a web developer, no search value folders like cgi-bin, and many more. The main idea is we don’t want them to be indexed.

Is robots.txt file a certain solution?

No. Standards based bots like Google’s, Yahoo’s or other big search engine’s robots listen to your robots.txt file. This is because they are programmed to. If configured so, any search engine bot can ignore the robots.txt file. Result: there is no guarantee.

How to use robot.txt file?

Robots.txt file has some simple directives which manages the bots. These are:

  • User-agent: this parameter defines, for which bots the next parameters will be valid. * is a wildcard which means all bots or Googlebot for Google.
  • Disallow: defines which folders or files will be excluded. None means nothing will be excluded, / means everything will be excluded or /folder name/ or /filename can be used to specify the values to excluded. Folder name between slashes like /folder name/ means that only folder name/default.html will be excluded. Using 1 slash like /folder name means all content inside the folder name folder will be excluded.

There are also some other parameters which are only supported by all browsers. These are:

  • Allow: this parameter works just the opposite of Disallow. You can mention which content will be allowed to be crawled here. * is a wildcard.
  • Request-rate: defines pages/seconds to be crawled ratio. 1/20 would be 1 page in every 20 second.
  • Crawl-delay: defines howmany seconds to wait after each succesful crawling.
  • Visit-time: you can define between which hours you want your pages to be crawled. Example usage is: 0100-0330 which means that pages will be indexed between 01:00 AM – 03:30 AM GMT.
  • Sitemap: this is the parameter where you can show where your sitemap file is. You must use the complete URL addres for the file.

Robots.txt example:

User-agent: * #allows all search engine spiders.
Disallow: /secretcontent/ #disallow them to crawl secretcontent folder.

Resources:
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360
http://www.robotstxt.org/
http://www.searchtools.com/robots/robots-txt.html
http://en.wikipedia.org/wiki/Robots.txt

Graph Connections Between Related Websites

Making keyword & link analysis is the core of a succesful SEO process. Although search engines provide you the results of a keyword search, you can not reach the website link connections easily.

TouchGraph Google Browser is a web based tool which helps you to explore the connections between websites based on URLs or keywords (from Google search results) with an interactive visual interface.

Visual SEO Product

Link analysis can be done really easily with this tool and it is very impressive to see a website’s importance for a keyword that clear.

Besides the Google Browser, TouchGraph also offers similar tools for Amazon and Facebook connection analysis.

Pagerank Checkers And SEO Tools

In general Google & other search engines are the primary self-marketers of websites. So, checking your website’s credibility at search engines is a good way of improving it.

Here are some major web-based (excluding softwares and browser extensions) pagerank check & seo tools websites which have resources like backlink analyzers, search engine position finders, and more.

Info: This list is not the “all SEO tools” list, but they are the ones which provide you many tools at the same place with an easy-to-use interface.

If you find these tools handy, bookmark the post at del.icio.us.

iWEBTOOL

Online Pagerank Checker

They have almost every seo tool you may need including keywork density checker and link popularity.

Khrido

Free Webmaster Tools

Besides the pagerank check and other webmaster tools, they have some advanced tools like mx / ns lookup, HTML Encryptor & more.

Although Khrido is not the most popular one, it is very user-friendly.

LinkVendor

Free SEO Tools

They have a social bookmark link checkervisual pagerank checker tool and a free seo report tool which are nice besides the standard seo tools.

SEOmoz

Advanced SEO Tools

SEOmoz provides advanced website analysis tools like finding how well your website is targeted for specific keywords or countries. Their pagerank checker & similar tools can be found here.

SEO Chat

Check Keyword Position

They are providing some tools for analyzing AdSense returns or comparing Google & Yahoo search results as a chart.

SEO Book SEO Tools

Keyword Analysis

Besides the standard SEO tools, they have an unique keyword analysis tool & a Firefox pagerank checker extension that can be handy.

Google Trends

A great Google tool that gives you an idea about the popularity of keywords, and websites.

If you know of a great one please share at comments.

Complete Website Validation: Free Site Validator

Free Site Validator is a very handy free service for validating webpages.

When compared to W3C validation services, this solution scans not only 1 page but follows links, goes deeper & almost scans all webpages in a website.

Free Site Validator

The service uses W3C validation service at the backend & enables you to scan multiple websites.

Report generation can take some time considering it scans all webpages but the service can simply notify you when the validation is completed.

Search-Based Keyword Tool From Google

If you have ever advertised at Google Adwords, you probably experienced a keyword guessing process.

Google now helps this with the Search-Based Keyword Tool.

Google Search Based Keyword Tool

The example simply demonstrates, if blog.theweblogix.com was to be advertised with Google Adwords & web design was a keyword to be used:

  • the tool checks the website to analyze the content
  • suggests other related keywords

Definitely a time saver for Google Adwords users.