What is robots.txt ?

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.
 
It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content are not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty.
 
Back
Top