[ic] RFC: $Vend::Robot and BounceRobotSessionURL

David Christensen david at endpoint.com
Wed Oct 7 21:01:05 UTC 2009


Folks,

Seeking any comments/code review for the following two patches  
(available in my github repo):

"Add new $Vend::Robot variable to track when we're dealing with an  
actual RobotUA":
http://github.com/machack666/interchange/commit/44c7b91e2596ae5e4d76b29446c94346b04693d8

"Add BounceRobotSessionURL directive":
http://github.com/machack666/interchange/commit/3da6fb97b4dc9b7b871864247342d5ae88929a2b

Including the full diffs below.

Regards,

David
----- 8< -----
commit 44c7b91e2596ae5e4d76b29446c94346b04693d8
Author: David Christensen <david at endpoint.com>
Date:   Wed Oct 7 14:45:52 2009 -0500

     Add new $Vend::Robot variable to track when we're dealing with an  
actual RobotUA

     This allows distinguishing between CGI-provided mv_tmp_session and
     actual robot usage, which just happens to set mv_tmp_session as a
     consequence.

diff --git a/lib/Vend/Server.pm b/lib/Vend/Server.pm
index ebbb7f3..a61d317 100644
--- a/lib/Vend/Server.pm
+++ b/lib/Vend/Server.pm
@@ -288,7 +288,7 @@ EOF
  #::logDebug("Check robot UA=$Global::RobotUA IP=$Global::RobotIP");
         if ($Global::RobotIP and $CGI::remote_addr =~  
$Global::RobotIP) {
  #::logDebug("It is a robot by IP!");
-               $CGI::values{mv_tmp_session} = 1;
+               $Vend::Robot = 1;
         }
         elsif ($Global::HostnameLookups && $Global::RobotHost) {
                 if (!$CGI::remote_host && $CGI::remote_addr) {
@@ -297,18 +297,20 @@ EOF
                 }
                 if ($CGI::remote_host && $CGI::remote_host =~  
$Global::RobotHost) {
  #::logDebug("It is a robot by host!");
-                       $CGI::values{mv_tmp_session} = 1;
+                       $Vend::Robot = 1;
                 }
         }
-       unless ($CGI::values{mv_tmp_session}) {
+       unless ($Vend::Robot) {
                 if ($Global::NotRobotUA and $CGI::useragent =~  
$Global::NotRobotUA) {
                         # do nothing
                 }
                 elsif ($Global::RobotUA and $CGI::useragent =~  
$Global::RobotUA) {
  #::logDebug("It is a robot by UA!");
-                       $CGI::values{mv_tmp_session} = 1;
+                       $Vend::Robot = 1;
                 }
         }
+
+    $CGI::values{mv_tmp_session} = 1 if $Vend::Robot;
  }

  # This is called by parse_multipart

----- 8< -----
commit 3da6fb97b4dc9b7b871864247342d5ae88929a2b
Author: David Christensen <david at endpoint.com>
Date:   Wed Oct 7 12:24:52 2009 -0500

     Add BounceRobotSessionURL directive

     Add BounceRobotSessionURL directive to 301 redirect robots which
     provide an explicit mv_session_id to the canonical page URL without
     the explicit mv_session_id.  This prevents search engine urls from
     being indexed with an explicit session_id.

diff --git a/WHATSNEW-5.7 b/WHATSNEW-5.7
index 37286fd..cc07588 100644
--- a/WHATSNEW-5.7
+++ b/WHATSNEW-5.7
@@ -14,6 +14,11 @@ Interchange 5.7.2 released 2009-09-17.
  Core
  ----

+* Add BounceRobotSessionURL directive to 301 redirect robots which
+  provide an explicit mv_session_id to the canonical page URL without
+  the explicit mv_session_id.  This prevents search engine urls from
+  being indexed with an explicit session_id.
+
  * Close remote disclosure security vulnerability, and added new  
configuration
    option AllowRemoteSearch to selectively re-enable remote searches  
on "safe"
    tables. Defaults to products, variants and options.
diff --git a/lib/Vend/Config.pm b/lib/Vend/Config.pm
index 1468211..2ba2175 100644
--- a/lib/Vend/Config.pm
+++ b/lib/Vend/Config.pm
@@ -713,6 +713,7 @@ sub catalog_directives {
      ['UserTrack',        'yesno',            'no'],
  	['DebugHost',	     'ip_address_regexp',	''],
  	['BounceReferrals',  'yesno',            'no'],
+	['BounceRobotSessionURL',		 'yesno', 'no'],
  	['OrderCleanup',     'routine_array',    ''],
  	['SessionCookieSecure', 'yesno',         'no'],
  	['SessionHashLength', 'integer',         1],
diff --git a/lib/Vend/Dispatch.pm b/lib/Vend/Dispatch.pm
index caf3415..9acb588 100644
--- a/lib/Vend/Dispatch.pm
+++ b/lib/Vend/Dispatch.pm
@@ -1244,6 +1244,8 @@ sub dispatch {
  	$sessionid = $CGI::values{mv_session_id} || undef
  		and $sessionid =~ s/\0.*//s;

+    my $orig_sessionid = $sessionid; # save for robot check with  
explicit session id
+
  	$::Instance->{CookieName} = $Vend::Cfg->{CookieName};

  	if($CGI::values{mv_tmp_session}) {
@@ -1552,7 +1554,8 @@ EOF
          );
      }

-	if ($new_source and $CGI::request_method eq 'GET' and $Vend::Cfg- 
 >{BounceReferrals}) {
+	if (($new_source and $CGI::request_method eq 'GET' and $Vend::Cfg- 
 >{BounceReferrals})
+        or ($Vend::Robot and $orig_sessionid and $Vend::Cfg- 
 >{BounceRobotSessionURL})) {
  		my $path = $CGI::path_info;
  		$path =~ s:^/::;
  		my $form =

--
David Christensen
End Point Corporation
david at endpoint.com







More information about the interchange-users mailing list