[ic] Just upgraded 4.8.9->5.2 - RobotUA question
brian at vermonster.com
Mon Dec 13 22:21:55 EST 2004
On Mon, 13 Dec 2004 DB <DB at M-and-D.com> wrote:
>> On Sun, 12 Dec 2004, DB wrote:
>>> I just upgraded by foundation based catalog from 4.8.9 to 5.2.0. I
>>> followed the UPGRADE file instructions and things went pretty
>>> smoothly. My main reason for the upgrade was to take advantage of
>>> the RobotUA feature.
>>> After the upgrade, I added the section below to the end of my
>>> interchange.cfg, however I still entries like this in my apache
>>> "GET /unlisted.html?id=gAW3nswb HTTP/1.0" 200 17202 "-" "ia_archiver"
>>> "GET /helpfaq.html?id=SRvEvzVq HTTP/1.0" 200 32017 "-" "msnbot/0.3
>>> Now I thought the RobotUA prevented spiders from obtaining session
>>> ids? Am I confused, or can someone tell me why these spiders
>>> appears to be still obtaining session ids?
>> Are you sure that they're still obtaining session IDs? All those log
>> entries tell you is that they're successfully spidering URLs that
>> have session IDs already in them. Mostly likely their index of your
>> site already includes hundreds of URLs with embedded session IDs,
>> and they'll keep spidering those, getting results, and thinking
>> everything's fine.
>> The change you made says that they won't be issued a session ID,
>> which is probably working. But it can't purge their old indexes.
>> Perhaps some spiders eventually stop polling old addresses that
>> aren't linked any longer, but I don't have any evidence of that.
>> If you want to be sure, do something like:
>> GET -H 'User-agent: ia_archiver' http://yoururl
>> And look for session IDs in the URLs you get back on that page.
> Hmm could be - how would I use that GET statement - in a perl script?
> I'm not familiar with the syntax.
Drew McLellan has a nice article about testing this sort of thing with the
standard curl library you probably have installed on your linux box.
For instance, try this from a command line:
# curl --user-agent 'GoogleBot' http://yoursiteurl
More information about the interchange-users