Wednesday, September 22, 2004

Web Development News: Web Spider Traps - robots.txt Traps and Robot Detection

Web Development News: Web Spider Traps - robots.txt Traps and Robot Detection: "On the pages of Web Spider Traps is one of the most comprehensive studies I have seen in Robot behavior patterns. With the use of craftily constructed robots.txt files, .htaccess files, detection scripts, and a variety of other techniques this page has detected, proven and found guilty numerous spiders that violate the code of robots.txt and the Robot Meta elements of your web pages...

For instance, Googlebot was shown to have followed it's orders in robots.txt except for files of type pdf, tar and zip. This trap has also caught red-handed in the act www.dir.com (Pompos), Gigabot, ia_archiver, and Yahoo! Slurp to name a few.

Spiders that do not follow robots.txt rules or do not limit bandwidth usage include WebCrawler, Ask Jeeves, MSNbot/0.1, msnbot/0.11, and several others.
"

Google
Creative Commons Licence
This work is licensed under a Creative Commons License.