Controlling Visits by Robots
Tip 3 : Robot Visits
Monitoring the activity of robots is an important function of web site administration. Robots used by the Search Engines (e.g. Google ➚) continually scan web sites to keep their indices up to date. Once search engines are regularly visiting a web site you may want to control which areas of the site are visited. This is controlled by a file named robots.txt that is located in the root folder of a domain.
The file contains a set of directives that a robot should read before scanning a site, it states which pages are to be included and excluded from such scans. You can use this facility to prevent pages being indexed by search engines. For example you may have a set of test pages that you don't want to be seen yet or you might want to keep images used on the site from public search.
Example /robots.txt :
User-agent: *
Disallow: /newversion/
Disallow: /directory.htm
This excludes access to the newversion folder and a specific file /directory.htm the rule applies to all robots (you can make it apply to specific agents if you wish).
You can also indicate that you do not want robots to reference a page using the META tags within the page itself :
<meta name="robots" content="noindex">
For further details please visit : www.RobotsTxt.org ➚.
Site Vigil takes notice of the robots.txt directives when it scans web sites.
- Improving Page load time (1)
- Keyword Specification (2)
- Robot Visits (3)
- Renaming a Page (4)
- Finding out about a Site (5)
- Keeping a Domain (6)
- Curtailing Spam (7)
- Resizable Pages (8)
- Readable Text (9)
- Alexa Data (10)
- Headers count (11)
- Using ALT text (12)
- Choosing a domain name (13)
- Using Optimizing tools (14)
- Optimization tricks to avoid (15)
- Getting Quality Links (16)