What is the use of robots.txt?

What Is Robots.txt?

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.

The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, visit The Web Robots Pages or you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.
 
User-agent: *
Disallow: /

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities.
the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
 
Robots.txt is the common name of a text file that is uploaded to a Web site's root directory and linked in the HTML code of the Web site. The robots.txt file is used to provide instructions to the Web site to Web robots and spiders. Web authors can use robots.txt to keep cooperating Web robots from accessing all or parts of a Web site that you want to keep private.
 
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
 
A spiders.txt computer file is a written text computer file that prevents web spider software, such as Googlebot, from creeping certain pages of your site. The computer file is basically a list of instructions, such Allow and Stop , that tell web spiders which URLs they can or cannot recover.
 
Robots.txt is a text file, It gives instruction to search engine crawlers about indexing and caching of a web page file of a website of directory, domain.
 
The robots.txt is a simple text file in your web site that informs search engine bots how to crawl and index website or web pages. It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want.
 
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.
 
Last edited:
Back
Top