What is the major use of a Robots.txt ?

The /robots.txt is a de-facto standard, and is not owned by any standards body. There are two historical descriptions:

the original 1994 A Standard for Robot Exclusion document.
a 1997 Internet Draft specification A Method for Web Robots Control
In addition there are external resources:

HTML 4.01 specification, Appendix B.4.1
Wikipedia - Robots Exclusion Standard
The /robots.txt standard is not actively developed. See What about further development of /robots.txt? for more discussion.

The rest of this page gives an overview of how to use /robots.txt on your server, with some simple recipes. To learn more see also the FAQ.
 
A robots.txt computer file is data at the root of your website that indicates those parts of your website you don't want accessed by online search engine crawlers.
 
If you want search engines to ignore any duplicate pages on your website
If you don’t want search engines to index your internal search results pages
If you don’t want search engines to index certain areas of your website or a whole website
If you don’t want search engines to index certain files on your website (images, PDFs, etc.)
If you want to tell search engines where your sitemap is located
 
You might be surprised to hear that one small text file, known as robots.txt, could be the downfall of your website. If you get the file wrong you could end up telling search engine robots not to crawl your site, meaning your web pages won’t appear in the search results. Therefore, it’s important that you understand the purpose of a robots.txt file and learn how to check you’re using it correctly.
 
The robots.txt file is used to provide instructions about the Web site to Web robots and spiders of search engine. Website owner can use robots.txt to keep cooperating Web robots from accessing all or parts of a Website that you want to not to crawl by search engines.
 
Back
Top