[ic] mv_no_session idea (best of both worlds)

Grant emailgrant at gmail.com
Fri Jul 13 11:39:01 EDT 2007


> > > My advice would be to use "ScratchDefault mv_no_session_id 1" and to
> > > forget all about the "mv_no_session" scratchpad variable.
> > >
> > This is tricky.  It's a balancing act between server load,
> > functionality, and clean URLs for new/altered spiders and links
> > pointing to the site.
> >
> Search engine spiders are pretty much covered by the Robot*
> configuration directive.  Recognised spiders will not be shown
> session IDs in any of the in your pages.
>
> You can get the latest robots.cfg file from here:
>
>     http://www.interchange.rtfm.info/downloads.html
>
> (Scroll down to find and download the robots.cfg file.)
>
> Once you have that file, you can do one of two things:
>
>     1. (recommended) Throw the file into the "etc" directory, found
>        under your Interchange installation's home directory, and
>        then modify your interchange.cfg (1) to "include etc/robots.cfg",
>        and (2) to remove any existing Robot* directives.
>
>     2. Replace any existing Robot* and NotRobot* directives in
>        your interchange.cfg file with the content of the new
>        robots.cfg file.
>
> Either of the above methods will work, but the first method will
> make it easier for you to keep the file up to date.  Just replace
> the file whenever a new version is available.
>
> Note that new new configuration will not be used until you restart
> Interchange.
>
> Please send suggestions for new Robot* directive entries either to
> me, or to this mail list.  I will review the suggestions, and add
> them to the file if I decide that it is correct to do so.
>
> Links pointing to the site (including bookmarks) are not a
> problem.  If a session ID is specified in a link then that session
> will only be valid if (1) the session has not expired and (2) if the
> user's IP address or hostname matches the values stored in the
> session.  If the session is found to be invalid then a new session
> will be silently created, with a different ID, and all links on the
> initial page will be created to point at the new ID.
>
> >
> > If I just set mv_no_session, a new session will be created for each
> > request if the browser doesn't accept cookies right?
> >
> Correct.  A new session with every request will effectively mean that
> no session is maintained at all, but with the added overhead of creating
> wasted session files.  You will probably also need to increase the
> length of your session ID string to increase the number of session IDs
> that can be created.
>
> CPU time will be wasted in maintaining these unused sessions, disk
> space and inodes will be wasted by these unused session files, and
> further CPU time and disk load will be wasted while Interchange looks
> for an unused session ID with every request.
>
> Session IDs are created at random.  If you have a lot of "used" session
> IDs then the random ID might be a duplicate.  Interchange has to check
> and allocate a new ID if required, and keep doing that in a loop until
> an unused ID is found.  This takes time.
>
> I must admit that I'm struggling to see the problem that you're trying
> to solve here.

Hi Kevin,

There are two main issues to balance against each other:

1. too many sessions cause performance problems, and site
functionality is hampered without a consistent session

2. unclean URLs lead to *search engine* confusion

The first problem is created by using mv_no_session, and the second is
created by not using mv_no_session.  I think we're clear on #1, but #2
exists because of this:

The RobotUA directive (for example) can only keep up with new spiders
and spiders that drastically change their UA as fast as the admin can
update the directive.  Because of this, spiders can slip past the
filter and index pages such as:

mydomain.com/page.html?id=abc123

Secondly, a user might link to page.html with the above URL.  This
could happen if the user doesn't accept cookies, or if they do accept
cookies and they copy the URL after their second click since the
session ID is appended to the URL at that point, even if cookies are
accepted.

If the above page is indexed in the search engines, or the link is
evaluated by a search engine to increase the page's PageRank (in
Google's case) it will be considered a different page than this:

mydomain.com/page.html

That causes a serious problem with search engine rankings in the form
of duplicate content and PageRank being split between the pages.

I'm starting to think 'ScratchDefault mv_no_session_id 1' is the way
to go, although it's definitely a trade-off because of #2.

- Grant


More information about the interchange-users mailing list