[ic] RobotUA

Grant interchange-users@icdevgroup.org
Tue Nov 26 21:58:00 2002


>Grant [listbox@email.com] wrote:
>>
>> I've had my RobotUA all set up for a few days, but examining my rotated
>> access_log files, the robots aren't getting any further than this:
>>
>> 66.196.65.16 - - [25/Nov/2002:18:30:41 -0800] "GET /robots.txt
>HTTP/1.0" 200
>> 0 "-" "Mozilla/3.0 (Slurp/si; slurp@inktomi.com;
>> http://www.inktomi.com/slurp.html)"
>> 66.196.65.16 - - [25/Nov/2002:18:30:42 -0800] "GET / HTTP/1.0"
>301 330 "-"
>> "Mozilla/3.0 (Slurp/si; slurp@inktomi.com;
>> http://www.inktomi.com/slurp.html)"
>>
>> Here's my RobotUA entry:
>>
>> RobotUA WebCrawler, BaiDuSpider, ZyBorg, almaden.ibm, Googlebot, Slurp,
>> Girafabo
>> t, ia_archiver, LinkWalker, MSIECrawler
>>
>One of four things could be happening:
>
>  1. Your robots.txt could be limiting access.
>  2. The spider may object to receiving a "301 Moved" status when asking
>     for a webpage.  Perhaps it suspects 'cloaking' and just stops there.
>  3. The spider may intend to return later to ask for more pages.  Some
>     spiders do this to keep the load on your server to a minimum.
>     Remember that some servers have lots of websites.
>  4. RobotUA could be broken, although I doubt it.  You can check it
>     yourself by pretending to be "Slurp/si; slurp@inktomi.com" when
>     and then requesting '/'.  Check the resulting page for "unfriendly"
>     links.
>  5. I said 4, didn't I?  If you can think of another then let me know.

First of all, thanks a lot to Kevin, Phillip, and Jonathon for answering my
question.  After using the Sam Spade browser to make sure the links were
friendly, I'm thinking it must be #2 above.  How can I get www.mystore.com
to forward to www.mystore.com/cgi-bin/catalog/index.html without issuing a
301?  I was using .htaccess and the RedirectPermanent directive to
accomplish that redirect, but that definitely returns a 301.  What can I do
to make a clean switch there?

- Grant