[ic] new idea, feedback please

Jonathan Clark interchange-users@icdevgroup.org
Mon Oct 28 15:35:01 2002


> I think I may have come up with a way that would work for me to
> keep my URLs
> search engine friendly.  I do have a couple questions though, so
> please let
> me know what you guys think.
>
> The idea is to write a new usertag to replace the area tag.  It
> would check
> to see if the page argument is AlwaysSecure or not and write
> either http://
> or https:// based on that, then __SERVER_NAME__, then __CGI_URL__, then
> /(pageargument).html.  That would keep id= and mv_pc= out of the generated
> links and the search engine spiders would then be able to traverse all of
> the links.  I would always use the new tag to generate links, and cookies
> would just be required to shop at my store.  Here are my questions:
>
> 1. Will not having mv_pc= functionality cause a problem?  I realize it is
> there to prevent browser page-caching.  If it's not there, will users be
> viewing old versions of my pages after I've updated them?

Doing this blindly without regard for the type of browser, spider or
whatever is a bad thing.

There is new feature in 4.9.3, crafted by Mike Heins which does what you
require, when Interchange recognises a spider as the requestor.

Documentation for the new feature:

<quote docs>

RobotUA

The RobotUA directive defines a list of User Agents which will be classed as
crawler robots (search engines) and causes Interchange to alter its
behaviour to improve the chance of Interchange-served content being crawled
and hopefully listed.

The directive accepts a wildcard list - * represents any number of
characters, ? represents a single character. The elements of the list are
separated by a comma.

If a User Agent is recognised as a robot, the following will be performed by
Interchange:

   * mv_tmp_session scratch variable is set to 1, causing sessions to be
disabled and therefore the writing of session data to disk.
   * mv_no_session_id scratch variableis set to 1, causing Interchange to
generate URLs without a session id (eg. mv_session_id=KvWna2PT).
   * mv_no_count scratch variable is set to 1, causing Interchange to
generate URLs without an th incremental number, normally used to prevent
proxy cacheing (eg. mv_pc=4).
   * [area] generated URLs will not contain the session id (eg.
id=KvWna2PT).

Of course, these behavioural changes will not be persistant, as the
requested page is generated without a session.


It should be noted that once you have identified you are serving a page to a
robot, you should not use this to massively alter your page content in an
attempt to improve your ranking. If you do this, you stand the chance of
being blacklisted. You have been warned!

Example:
  	  RobotUA   Inktomi, Scooter, *Robot*, *robot*, *Spider*, *spider*

See also:
	RobotIP

RobotIP

The RobotIP directive defines a list of User Agents which will be classed as
crawler robots (search engines) and causes Interchange to alter its
behaviour to improve the chance of Interchange-served content being crawled
and hopefully listed.

The directive accepts a wildcard list - * represents any number of
characters, ? represents a single character. The elements of the list are
separated by a comma.

See RobotUA for a full description of the behavioural changes.

Example:
	  RobotIP   209.135.65, 64.172.5
</quote docs>

Jonathan
Webmaint.