What is Robots.txt? What Does It Do?
Robots.txt is a simple text file located in the root directory of a website that tells search engine bots which pages they should crawl and which ones they should not. When search engines come to a site, they usually first look at the robots.txt file and act according to the rules specified there. This system helps the webmaster retain control of the site.
Sample Robots.txt File
There are two most commonly used commands in the file: Disallow and Allow.
- Disallow tells bots not to access a specific page or folder.
- Allow, especially for some advanced bots, is used to indicate which sections are allowed to be accessed.
User-agent: *
Disallow: /admin/
Disallow: /draft/
Allow: /drafts/about.html
Disallow: /admin/
Disallow: /draft/
Allow: /drafts/about.html
In this example, all bots (*) are banned from /admin/ and /draft/. However, the about.html page in /drafts/ is allowed to be crawled.
The robots.txt file is especially important for shaping how site content appears to search engines. However, this file only limits the crawl, it does not completely prevent the indexing of content. If it is desired that the page does not appear in search results at all, the <meta name="robots" content="noindex"> label should also be used.
Post a Comment