[ic] Removing session ID from the URL

Grant emailgrant at gmail.com
Fri Jan 14 12:14:56 EST 2005


Hi Jonathon,

> > > > I'm experimenting with Google AdSense and Google sees
> > > > /page.html?id=abcd as a page not in its index although /page.html is.
> > > > This means it serves the default (non-targeted) ad to any page
> > > > accessed with a session ID in the URL.  I realize the id must be in
> > > > the URL to provide functionality for users not accepting session
> > > > cookies, but I think that's a tiny fraction at this point.  Is there
> > > > any way to disable the id?  Should I use Rewrite rules for this?
> > >
> > > With the correct Robot* directives in interchange.cfg Google bots
> > > should never see session ids.
> >
> > That's actually the problem.  Google indexes pages without the session
> > ID in the URL.  If you visit the same pages with an appended session
> > ID, Google will consider them different pages.  That means the AdSense
> > ads Google serves to the pages you visit won't be targeted to the
> > content of those pages.
> 
> In my experience this is not true.
> 
> I think you are confusing indexing for search engine results with indexing
> for adsense. These are two very separate things.
> 
> If you examine google's behaviour with an adsense site you will see that
> there is a different user agent for adsense - Google is having a look at the
> page you requested, which it then parses for adwords targeting. This is
> completely different to your site being listed in google's search results.
> If what you say was the case, it would not be possible to serve targeted ads
> to a site not in Google's main index, which of course makes no sense and
> certainly is not the case.
> 
> I did a quick experiment with an IC site serving adsense. It is set up in
> the normal way, no out of the ordinary or special directives. Its based on
> the standard demo of early IC 5.3. I looked at the ads served on the second
> page access which contains the session ID and found them to be just as
> targeted as on the first page. Just in case G had already seen the URL with
> this particular session ID, I added some more characters to the session ID -
> Google had _definately_ not seen that url. Ads still well targeted.
> 
> You can see this for yourself: http://www.free-polyphonic-ringtones-now.com/
> and also if you add 'session_please' to the end of the url this will enable
> session IDs - still the adsense works. (adsense is right at the bottom)
> 
> I think maybe in your case you have just not given G enough time to settle
> down with your site. Content and theme information needs to propogate around
> the datacenters before you will see consistent, targeted ads.

AdSense considers /page.html and page.html?id=abc to be different
pages and ads served to the second page won't be targeted to the first
page.  I think you must be seeing ads targeted to your site and not to
the individual page when you have a new session ID in the URI. 
AdSense seems to incorporate that type of baseline targeting, so if
they aren't familiar with a certain page (i.e. a page with an appended
session id) they will serve an ad targeted to the baseline (the site).

> > It looks like IC usually appends a session ID to all links on the
> > first page of the session.  If IC gets a session cookie back from the
> > user after that, it keeps the session IDs out of the links.  Is there
> > any way to keep IC from appending session IDs right away?  If it would
> > wait until page #2 of the session, cookie-friendly sessions would
> > never get the id-appended URL (as long as they're POSTing).  That
> > would mean no lost AdSense opportunities.
> >
> > Also, I've been noticing that after awhile IC will start serving pages
> > with session IDs appended to all URLs.  I'm seeing it in Firefox 1.0
> > on Linux.  I wonder if it's a Firefox or IC issue.  Has anyone else
> > noticed this?
> 
> As has been mentioned this happens if you visit the admin UI, and also if a
> visitor logs into their account, or buys something.

Ah, I didn't know that behavior started in again after a user logs in
or purchases.

> One other thing, if you look at the adwords serving code, G is passed the
> page and also its referring page each time it serves ads. In any case, this
> is unimportant as Adwords works fine with IC out of the box.

Do you mean Google is passed those parameters explicitly in the code?  Where?

<script type="text/javascript"><!--
google_ad_client = "pub-mynumber";
google_ad_width = 468;
google_ad_height = 60;
google_ad_format = "468x60_as";
google_ad_type ="text";
google_color_border = "000000";
google_color_bg = "FFFFFF";
google_color_link = "FF0000";
google_color_url = "000000";
google_color_text = "000000";
//--></script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js"></script>

I also wanted to say that I'm going to try commenting out all Robot*
directives and see how it goes.  mv_no_session and mv_no_count will be
set to keep id and mv_pc out of the URI for everyone.

The downsides are the performance and storage hit of keeping track of
sessions for robots, the exclusion of shoppers who don't accept
session cookies, and the minor issue of browser page caching.

The upsides are no Robot* directives to maintain, no chance of a false
positive de-sessioning a real user, and new robots and robots with a
recently modified UA will always have clean pages to crawl.  Plus it's
the simple way to go, and I'm rarely disappointed going that route.

If anyone thinks I'm blowing it here, I'd really like to hear about it.

Lastly, I've decided to abandon AdSense on my site.  If anyone cares
about my reasons let me know and I'll post it to the list.

- Grant

> best wishes,
> 
> Jonathan.


More information about the interchange-users mailing list