Is yahoo slurp misbehaving?
Complaints
Webmasters complain that yahoo slurp is not tripping the filter given in robots.txt for the following
User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/
User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Yahoo slurp obeys the agent specific rule and hence does not crawl the directories /bloop/ and /blop/directories where as it crawled the directories in the generic rule.
Discussions and Suggestions
Webmasters figured out that not only yahoo but even other search engines behaved in the same manner.
All the SE BOTS and slurps do not obey the Generic rule.
Suggestion 1: Put the wildcard ones first that has generic rule to less specific bots and then place the agent specific code.
Suggestion 2: main theme behind this is that the major bots get to their corresponding user agents and the rest carry on with wild card user agents.
Conclusion
All bots do trip and filter the specifications. They are tripped by the server configuration.
The fact is that major bots and slurps get to their respective user agents and the rest continue with the generic rule.
The exact rule for the above case would be
User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
This can be shortened to
User-agent: Slurp
User-agent: msnbot
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
Webmasters complain that yahoo slurp is not tripping the filter given in robots.txt for the following
User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/
User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Yahoo slurp obeys the agent specific rule and hence does not crawl the directories /bloop/ and /blop/directories where as it crawled the directories in the generic rule.
Discussions and Suggestions
Webmasters figured out that not only yahoo but even other search engines behaved in the same manner.
All the SE BOTS and slurps do not obey the Generic rule.
Suggestion 1: Put the wildcard ones first that has generic rule to less specific bots and then place the agent specific code.
Suggestion 2: main theme behind this is that the major bots get to their corresponding user agents and the rest carry on with wild card user agents.
Conclusion
All bots do trip and filter the specifications. They are tripped by the server configuration.
The fact is that major bots and slurps get to their respective user agents and the rest continue with the generic rule.
The exact rule for the above case would be
User-agent: Slurp
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: msnbot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
This can be shortened to
User-agent: Slurp
User-agent: msnbot
User-agent: googlebot
Disallow: /bloop/
Disallow: /blop/
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
User-agent: *
Disallow: /shop/
Disallow: /forum/
Disallow: /cgi-bin/
Disallow: /badbottrap/
0 Comments:
Post a Comment
Links to this post:
Create a Link
<< SEO Blog Home