[ic] Setting high RobotLimit ... where to set it?
John Young
john_young at sonic.net
Thu Sep 16 18:23:42 EDT 2004
Bryan D Gmyrek wrote:
> I have a site with over 100,000 products and want to make sure
> Google and the other spiders can index them all. In case several
> spiders are spidering the site I want RobotLimit to be high
> enough to allow them to hammer, but low enough to stop DoS
> attacks. Any suggestions on a limit? 100, 500, 1000?
RobotLimit doesn't apply to tmp_session/nsession. That is,
if the spider is identified as such by RobotUA, RobotHost, or
RobotIP, then it doesn't maintain a session, as we generally
think of it -- so the number of accesses doesn't increment.
> The documentation is a bit confusing to me. It says:
> "The RobotLimit directive defines the number of consecutive pages
> a user session may access without a 30 second pause. If the limit
> is exceeded, the command defined in the Global directive
> LockoutCommand will be executed and catalog URLs will be
> rewritten with host 127.0.0.1, sending the robot back to itself.
> The default is 0, disabling the check."
> So does it mean that you could access 200 pages as long as there
> is a 30 second pause between each access or does it mean you can
> access 200 pages as long as there is a total of a 30 second pause
> when you sum all pauses up.
The number of pages accessed within a 30 second time period.
In a normal user session, IC maintains a count of page accesses.
You load your store's index.html, and accesses++. Load the
results page, and accesses++. Load a flypage, accesses++.
If accesses within the last 30 seconds > RobotLimit, write
a warning to the error log, perform the lockout command if
configured to do so, etc.
HTH,
John Young
More information about the interchange-users
mailing list