[ic] Setting high RobotLimit ... where to set it?

John Young john_young at sonic.net
Thu Sep 16 18:23:42 EDT 2004


Bryan D Gmyrek wrote:

> I have a site with over 100,000 products and want to make sure
> Google and the other spiders can index them all.  In case several
> spiders are spidering the site I want RobotLimit to be high
> enough to allow them to hammer, but low enough to stop DoS
> attacks.  Any suggestions on a limit?  100, 500, 1000?

RobotLimit doesn't apply to tmp_session/nsession.  That is,
if the spider is identified as such by RobotUA, RobotHost, or
RobotIP, then it doesn't maintain a session, as we generally
think of it -- so the number of accesses doesn't increment.

> The documentation is a bit confusing to me.  It says:
> "The RobotLimit directive defines the number of consecutive pages
> a user session may access without a 30 second pause. If the limit
> is exceeded, the command defined in the Global directive
> LockoutCommand will be executed and catalog URLs will be
> rewritten with host 127.0.0.1, sending the robot back to itself.
> The default is 0, disabling the check."
> So does it mean that you could access 200 pages as long as there
> is a 30 second pause between each access or does it mean you can
> access 200 pages as long as there is a total of a 30 second pause
> when you sum all pauses up.

The number of pages accessed within a 30 second time period.
In a normal user session, IC maintains a count of page accesses.
You load your store's index.html, and accesses++.  Load the
results page, and accesses++.  Load a flypage, accesses++.
If accesses within the last 30 seconds > RobotLimit, write
a warning to the error log, perform the lockout command if
configured to do so, etc.

HTH,
John Young



More information about the interchange-users mailing list