[ic] Googlebot Getting 500 Errors ... but he's the only one

Bryan Gmyrek bryangmyrek at yahoo.com
Mon May 30 19:49:03 EDT 2005


Hi,

By the way the site I'm having the trouble with is http://www.neartexpress.com/ 
I haven't been able to check email much lately so sorry for the late reply.
Thanks Jonathan for providing that script!  I ran it and got the same results as Mike (everything
OK).
But it is interesting that Googlebot uses the if-modified-since header and other robots don't ...
this may have something to do with it after all?
Yahoo! Slurp and msnbot both grabbed a bunch of pages today too:
pages grabbed, useragent
   6559 Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
    892 msnbot/1.0 (+http://search.msn.com/msnbot.htm)
    573 Googlebot/2.1 (+http://www.google.com/bot.html)

But googlebot is the only one who got a bunch of errors:
500 Error Code report:
pages grabbed, useragent, error
    128 Googlebot/2.1 (+http://www.google.com/bot.html)*500
     36 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)*500
     33 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)*500
     13 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)*500

(i have a perl script run reports and email me hourly ... so I'm painfully aware of the problem)
For this site Google traffic is it's life's blood so this problem is killing it really..
One interesting thing to note is that googlebot 'comes from' tons of different IPs whereas most
other spiders
hit a site from one IP (i have the info to show this if anyone's interested).  I don't know if
that has anything to do with it though...

Responding to a couple questions that weren't really directed to me but I'll answer anyway:
Kevin Walsh wrote:
>There will probably be a message in one of your error.log files at the
>same time as the Apache log message.
Nope!  This error comes up _thousands_ of times some days and I wouldn't miss that in the
interchange error log!  All there is is the Apache access_log and error_log as mentioned before.

Here is my entire interchange /opt/interchange/error.log file since late yesterday (i'm running
two catalogs):
- - - [29/May/2005:23:50:10 -0700] - - STOP server (28944) on signal TERM
- - - [29/May/2005:23:50:10 -0700] - - STOP page servers (8393,8997,8857,8285,8429,8832) on signal
TERM
- - - [29/May/2005:23:50:13 -0700] - - Vend::Payment::AuthorizeNet payment module initialized,
using Net::SSLeay
- - - [29/May/2005:23:50:13 -0700] - - RPC traffic settings.
- - - [29/May/2005:23:50:20 -0700] - - ...UI is loaded...
- - - [29/May/2005:23:50:20 -0700] - - Interchange V5.2.0
- - - [29/May/2005:23:50:20 -0700] - - Config 'art' at server startup
- - - [29/May/2005:23:50:20 -0700] - - Config 'foundation520' at server startup
- - - [29/May/2005:23:50:22 -0700] - - START server (9464) (UNIX)
- - - [29/May/2005:23:50:22 -0700] - - ALERT: /opt/interchange/etc/socket socket permissions are
insecure; are you sure you want permissions 666?
- - - [29/May/2005:23:50:22 -0700] - - START server (9464) (UNIX)
- - - [29/May/2005:23:50:22 -0700] - - Interchange page server started (process id 9466)
- - - [29/May/2005:23:50:22 -0700] - - Interchange page server started (process id 9468)
- - - [29/May/2005:23:50:22 -0700] - - Interchange page server started (process id 9470)
- - - [29/May/2005:23:50:22 -0700] - - Interchange page server started (process id 9472)
- - - [29/May/2005:23:50:22 -0700] - - Interchange page server started (process id 9474)
65.54.188.109 - - [30/May/2005:00:59:28 -0700] art /q/Robert Duncan.html Unauthorized for that
session pAJsZhcN>robert. Logged.
65.54.188.109 - - [30/May/2005:07:13:09 -0700] art /q/Robert Duncan.html Unauthorized for that
session pAJsZhcN>robert. Logged.

>How have you determined that this is a mod_interchange problem?  I've
>forgotten the details of the original thread; It was a while ago.
I haven't for one ... but based on what Jonathan said I suspect it now.  I've used mod_interchange
for a long time so don't have a non-mod_interchange version of this catalog to compare to.

The worst part is it's not just Googlebot.  I have had problems with Safari on the Mac also ...
but it doesn't seem to be confined to that.  It's causing legitimate traffic to be thrown away:
$cat access_log | grep -ve"\*500\*" | grep -e "www.google.com/search" | less
69.224.122.28*[30/May/2005:04:24:15 -0700]*GET /liRES169.html?mv_pc=froogle
HTTP/1.1*http://www.google.com/search?hl=en&lr=&q=original+1939+movie+poster+The+Wizard+of+
Oz*Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; YPC 3.0.3; .NET CLR 1.1.4322)*500*0*-
69.113.158.187*[30/May/2005:06:07:24 -0700]*GET /artist/Heywood_Hardy.html
HTTP/1.1*http://www.google.com/search?hl=en&lr=&rls=GGLD%2CGGLD%3A2004-51%2CGGLD%3Aen&q=heyw
ood+hardy+going+to+cover*Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)*500*0*-
199.222.170.143*[30/May/2005:06:20:15 -0700]*GET /liNIMARM113.html?mv_pc=adwordsdata
HTTP/1.1*http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLD,GGLD:20
05-08,GGLD:en&q=nudes*Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)*500*0*-
66.65.109.89*[30/May/2005:06:45:17 -0700]*GET /q/Nap.html
HTTP/1.1*http://www.google.com/search?hl=en&q=%22spiritual+nap%22*Mozilla/4.0 (compatible; MSIE
6.0; Windows 
NT 5.1; SV1)*500*0*-
80.99.43.231*[30/May/2005:06:53:31 -0700]*GET /q/Ambro%20Zandos.html
HTTP/1.1*http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=RNWI,RNWI:2004-51,RNWI:en&q=
A%2E+Zandos*Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)*500*0*-
162.83.117.116*[30/May/2005:08:58:50 -0700]*GET /liOWP3853C.html?mv_pc=adwordsdata
HTTP/1.1*http://www.google.com/search?hl=en&q=titanic&btnG=Google+Search*Mozilla/4.0
 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)*500*0*-
162.83.117.116*[30/May/2005:08:58:57 -0700]*GET /liOWP3853C.html?mv_pc=adwordsdata
HTTP/1.1*http://www.google.com/search?hl=en&q=titanic&btnG=Google+Search*Mozilla/4.0
 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)*500*0*-
162.83.117.116*[30/May/2005:09:01:56 -0700]*GET /liOWP3853C.html?mv_pc=adwordsdata
HTTP/1.1*http://www.google.com/search?hl=en&q=titanic&btnG=Google+Search*Mozilla/4.0
 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)*500*0*-
162.83.117.116*[30/May/2005:09:02:01 -0700]*GET /liOWP3853C.html?mv_pc=adwordsdata
HTTP/1.1*http://www.google.com/search?hl=en&q=titanic&btnG=Google+Search*Mozilla/4.0
 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)*500*0*-
65.117.45.194*[30/May/2005:09:11:16 -0700]*GET /q/Ron%20Yrabedra.html
HTTP/1.1*http://www.google.com/search?hl=en&q=ron+yrabedra&btnG=Google+Search*Mozilla/4.0 (compat
ible; MSIE 6.0; Windows NT 5.1; SV1)*500*0*-
24.16.29.188*[30/May/2005:09:18:10 -0700]*GET /q/Phot.html
HTTP/1.1*http://www.google.com/search?q=Seabiscuit+Moves+Ahead+of+War+Admiral,+poster&hl=en&lr=&start=10&sa=
N*Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)*500*0*-
....
57 entries like this and 406 successful attempts.
You'll notice the mv_pc=adwordsdata in many of the incoming links since i've had to step up ppc
spending to drive traffic.  But losing that to a 500 page is even worse since it is paid for
either way....

It was suggested I might run traceroute while google is accessing and try to figure out what is
going on that way ... any advice on this.
Or where might I uncomment a logdebug line in the code to illuminate this more (or is there a
secret debug mode of IC i could use?).

Here is an excerpt from my Apache access_log today.  Note that it doesn't seem specific to any
page and also note there are no corresponding error entries in the interchange error log:

66.249.71.73*[30/May/2005:14:22:09 -0700]*GET /FWBA1800024000.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*15529*-
66.249.71.67*[30/May/2005:14:22:11 -0700]*GET /category/Kitchenware.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*66077*-
66.249.71.72*[30/May/2005:14:22:11 -0700]*GET /GALT117.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*20069*-
66.249.71.73*[30/May/2005:14:22:20 -0700]*GET /FOA112.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*21068*-
66.249.64.79*[30/May/2005:14:22:25 -0700]*GET /LASAC-386m.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*20906*-
66.249.64.68*[30/May/2005:14:22:29 -0700]*GET /liBEJ0602.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*15993*-
66.249.71.18*[30/May/2005:14:22:34 -0700]*GET /liBEJ590.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*16058*-
66.249.64.55*[30/May/2005:14:22:55 -0700]*GET /artist/Angelica_Kauffman.html
HTTP/1.0*-*Googlebot/2.1 (+http://www.google.com/bot.html)*500*532*-
66.249.64.79*[30/May/2005:14:22:59 -0700]*GET /q/Boy.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.32*[30/May/2005:14:23:03 -0700]*GET /liDIR3375.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.73*[30/May/2005:14:23:27 -0700]*GET /liMYS1006977.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.33*[30/May/2005:14:23:30 -0700]*GET /liTELSPR563.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*17025*-
66.249.71.32*[30/May/2005:14:23:37 -0700]*GET /liMYS1006918.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*15119*-
66.249.71.29*[30/May/2005:14:23:40 -0700]*GET /wm107.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.73*[30/May/2005:14:23:44 -0700]*GET /A1107.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*20128*-
66.249.64.18*[30/May/2005:14:23:55 -0700]*GET /q/Fields.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.38*[30/May/2005:14:23:59 -0700]*GET /KHFC2233.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*18403*-
66.249.64.55*[30/May/2005:14:24:03 -0700]*GET /liTELLE210.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.38*[30/May/2005:14:24:21 -0700]*GET /A1404S.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*20168*-
66.249.64.58*[30/May/2005:14:24:28 -0700]*GET /KHF2231.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*18411*-
66.249.64.68*[30/May/2005:14:24:32 -0700]*GET /artist/16th_Century_Oriental.html
HTTP/1.0*-*Googlebot/2.1 (+http://www.google.com/bot.html)*500*532*-
66.249.71.39*[30/May/2005:14:24:42 -0700]*GET /liCLICB11.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*17930*-
66.249.64.79*[30/May/2005:14:24:45 -0700]*GET /WH31969.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*21462*-
66.249.64.33*[30/May/2005:14:25:03 -0700]*GET /liDEVLD210.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*17000*-
66.249.71.18*[30/May/2005:14:25:05 -0700]*GET /liROSSPR811.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*14347*-
66.249.64.38*[30/May/2005:14:25:09 -0700]*GET /liWEBSFS2052.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*16789*-
66.249.64.79*[30/May/2005:14:25:36 -0700]*GET /liBENAA30148.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.68*[30/May/2005:14:25:53 -0700]*GET /liMIGFAR30926.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.69*[30/May/2005:14:26:03 -0700]*GET /liTEL2332.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*14980*-
66.249.64.58*[30/May/2005:14:26:10 -0700]*GET /FWBB2200028000.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.58*[30/May/2005:14:26:23 -0700]*GET /A1151.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*20147*-
66.249.71.18*[30/May/2005:14:26:32 -0700]*GET /q/Drawing_Modern.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.67*[30/May/2005:14:26:33 -0700]*GET /artist/Celeste_Peters.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.69*[30/May/2005:14:26:40 -0700]*GET /FOA110.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*29846*-
66.249.71.73*[30/May/2005:14:26:49 -0700]*GET /liBENAA50047.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.69*[30/May/2005:14:27:22 -0700]*GET /liGAMAAL003C.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.32*[30/May/2005:14:27:33 -0700]*GET /q/Bushes.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.55*[30/May/2005:14:27:41 -0700]*GET /liIMCTFA134.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*16173*-
66.249.71.32*[30/May/2005:14:27:49 -0700]*GET /GALT159.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*20047*-
66.249.64.68*[30/May/2005:14:28:12 -0700]*GET /category/Vettriano.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*66111*-
66.249.71.39*[30/May/2005:14:28:13 -0700]*GET /liCAN1520.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*16825*-
66.249.64.68*[30/May/2005:14:28:23 -0700]*GET /q/Periwinkle.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.38*[30/May/2005:14:28:26 -0700]*GET /idrspb10505.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*29257*-
66.249.64.58*[30/May/2005:14:28:41 -0700]*GET /liHHC1307230.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.71.28*[30/May/2005:14:29:00 -0700]*GET /liBENAB30150.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*500*532*-
66.249.64.33*[30/May/2005:14:29:01 -0700]*GET /liBENAB3114.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*16788*-
66.249.64.79*[30/May/2005:14:29:02 -0700]*GET /liTEL5491.html HTTP/1.0*-*Googlebot/2.1
(+http://www.google.com/bot.html)*200*17034*-

Thanks again for the help everyone!
Bryan


More information about the interchange-users mailing list