[ic] Possible bug: Too many new ID assignments for this IP address

Wed Aug 24 08:28:45 EDT 2005

On Wednesday, August 24, 2005 2:29 AM, mike at perusion.com wrote:

> Quoting John1 (list_subscriber at yahoo.co.uk):
>> In October 2004 I posted:
>> http://www.icdevgroup.org/pipermail/interchange-users/2004-October/041215.html
>> explaining what I thought was a bug which can result in *permanently*
>> blocked access to Interchange sites from ISPs who use proxy servers.
>>
>> To avoid this problem we are currently running with "RobotLimit 0",
>> so it's not really causing us a problem any more (although it would
>> be nice not to have to use RobotLimit 0).
>>
>> Here is the sub count_ip code (which is still the same as it was in
>> October 2004):
>>
>> sub count_ip {
>> my $inc = shift;
>> my $ip = $CGI::remote_addr;
>> $ip =~ s/\W/_/g;
>> my $dir = "$Vend::Cfg->{ScratchDir}/addr_ctr";
>> mkdir $dir, 0777 unless -d $dir;
>> my $fn = Vend::Util::get_filename($ip, 2, 1, $dir);
>> if(-f $fn) {
>>  my $grace = $Vend::Cfg->{Limit}{robot_expire} || 1;
>>  my @st = stat(_);
>>  my $mtime = (time() - $st[9]) / 86400;
>>  if($mtime > $grace) {
>>   ::logDebug("ip $ip allowed back in due to '$mtime' > '$grace'
>>   days"); unlink $fn;
>>  }
>> }
>> return Vend::CounterFile->new($fn)->inc() if $inc;
>> return Vend::CounterFile->new($fn)->value();
>> }
>>
>> I believe crux of the problem is that this code is checking the last
>> *modified* time which actually has the effect of *permanently*
>> blocking large ISPs who use a relatively small number of proxy
>> servers.
>>
>> ##########  snippet from my post in October 2004:
>> So, here is the problem:  any IP address that is typically allocated
>> more than 1 session id in a 24 hr period will never get its addr_ctr
>> file expired.  i.e.  There needs to be a full 24 hr period without
>> access from the same IP address before the addr_ctr file will be
>> deleted thus re-allowing access from that IP address.  For large
>> ISPs using a relatively small number of proxy servers this may
>> *never* happen, and so access
>> from their proxy servers is permanently blocked.
>> ##########
>
> I am perfectly willing to believe I have screwed up, but I had thought
> this had been addressed with
>
> Limit robot_expire 0.05
>
> This changes the 24-hour period to one hour. And since the first call
> is always to count_ip() without incrementing the counter (and
> therefore the mtime) the maximum lockout should be that one hour.
>
Do you mean "Since only the first call to count_ip() increments the counter 
(and therefore the mtime) the maximum lockout should be that one hour?

If I am reading the code in count_ip correctly the addr_ctr/IP file will 
only be deleted if its modified time is greater than "Limit robot_expire"

If I understand correctly, the code in sub new_session calls count_up(1) 
(and therefore updates mtime if the addr_ctr/IP file already exists) each 
time a new session is created.

Consequently the addr_ctr/IP file will keep counting up unless there is a 
*gap* of greater than "limit robot_expire" before a new session id is 
requested by the same IP address.

i.e.  So if you use "Limit robot_expire 0.05", provided there are at least 2 
requests per hour for a new session id from the same IP address the 
addr_ctr/IP file will keep counting up forever.

Then after a few days or weeks RobotLimit will eventually be exceeded and 
the IP address will then be *permanently* locked out.  By permanent I mean 
until there is a gap of at least 1 hour between requests for new session ids 
from the IP address in question.

> If you have such traffic that you assign 100 legitimate IP addresses in
> an hour, it means you would have to have a much better robot defense
> than RobotLimit can supply....
>
So what I am saying above is that you don't need 100 accesses from the IP 
address to maintain a lockout, you only need at least 2 each hour to 
maintain the lockout situation.

> Also, a normal ISP proxy server should not see this; just if it is
> running behind a NAT. The IP address used is not the IP of the proxy
> server but the IP address of the user as sent by the proxy server.
>
I agree, but for some reason, in the UK at any rate, AOL appear to operate a 
NAT proxy setup.  Not sure why they do this, but they seem to - here are 
some of the proxy servers that I found Interchange was blocking until I used 
RobotLimit 0.

195.93.34.12          cache-loh-ac06.proxy.aol.com
212.100.251.149    lb1.onspeed.com
62.254.64.12          midd-cache-1.server.ntli.net
62.252.224.13        leed-cache-2.server.ntli.net
62.252.192.5          manc-cache-2.server.ntli.net
62.252.0.5              glfd-cache-2.server.ntli.net
62.254.0.14            nott-cache-2.server.ntli.net
80.5.160.4              bagu-cache-1.server.ntli.net
cache-los-ad06.proxy.aol.com
cache-los-ad02.proxy.aol.com
cache-los-ad03.proxy.aol.com
cache-los-ab04.proxy.aol.com
cache-los-ab01.proxy.aol.com
cache-los-aa02.proxy.aol.com

The ntli.net proxy servers belong to NTL who are a major UK cable TV 
provider (and therefore also a large broadband provider).

> I run some pretty busy Interchange servers, and I never see trouble
> with this with the exception of NATs for fair-sized companies
> accessing their own IC server. Even then, the above "Limit" fixes the
> problem.
>
Ummm, it does seem strange that more people have not noticed this problem 
(although a few postings to the list suggest that I am not alone).  I can 
accept that I maybe jumping to the wrong conclusion about the cause (and 
perhaps I have missed something about how the code works), but hopefully I 
have not misunderstood.

If I have understood correctly then perhaps one solution would be to purge 
the addr_ctr directory at a regular intervals, say every 24hrs.  That way 
with a high enough RobotLimit and low enough Limit robot_expire the addr_ctr 
would be purged before RobotLimit was ever exceeded.  Perhaps an 
AddrCtrExpire configuration directive could be added to do this?