[ic] Inktomi/Yahoo Search Engine Results include Session ID's

Gary Norton gnorton at broadgap.com
Mon Feb 23 11:37:58 EST 2004


> > > I was wondering if anyone else was experiencing any problems with this
> as
> > > well.
> > >
> > > To illustrate, if you go to yahoo and search for "Toyota lift kits"
> > > (http://search.yahoo.com/search?p=toyota+lift+kits&ei=UTF-8&fr=fp-tab-
> web-t&
> > > n=20&fl=0&x=wrt)
> > >
> > > And look at the current #2 listing (suspensionconnection.com) you will
> > > notice that the session id has been indexed.
> > >
> > > If you go even further and click "More pages from this site"
> > > (http://search.yahoo.com/search?p=toyota+lift+kits&ei=UTF-
> 8&n=20&fl=0&fr=fp-
> > > tab-web-t&vst=0&vs=www.suspensionconnection.com)
> > >
> > > It will display the "TOP 20 WEB RESULTS out of about 15,700". All
> together
> > > this site should have less than 3000 pages.  If you look at many of
> the
> > > links you can find the same page listed several times with a different
> > > session ID.
> > >
> > My guess is that you have upgraded to Interchange 5, from 4.8 or lower,
> > and these entries are artifacts from previous spider runs.  If a spider
> > is identified, Interchange 5 (and some 4.9s) will prevent session IDs
> > from being encoded into the URI args, so you get nice clean index
> entries.
> > Interchange versions 4.8 and earlier didn't have any spider-trap code
> > at all.
> >
> > If a search engine already has a URI with a session ID in its index
> > then it will attempt to check if the URI is still valid.  To do this,
> > it will simply request the page as part of its crawl.  Interchange will
> > happily serve the page, so the search engine will assume that the
> > index entry is correct.
> >
> > It is relatively easy to clean out the "invalid" search engine index
> > entries with a small change to the Interchange core.  Once your website
> > has been re-crawled (perhaps a month later) and the indexes are clean,
> > the extra Interchange core code can be removed.
> >
> > At least, with Interchange 5, you will not see any new session IDs in
> > the indexes.  Google, of course, is more sensible and tends to simply
> > not follow URIs with arguments at all.
> >

Kevin,

You are correct, this site was originally built on 4.8. However, it has been
running on the 4.9-5.0 code for almost a year now. We have not noticed this
problem until recently (I.E. the Yahoo/Inktomi switch). That doesn't mean
that the problem did not exist before, but only that we did not notice it.
The site has generally ranked pretty well and we do check the rankings
frequently.

What change are you referring to for the Interchange core? This might be
something we need to look closer at.

Thanks for your help.
- Gary




More information about the interchange-users mailing list