[ic] mod_interchange and Apache MaxClients

Thu Nov 17 06:11:36 EST 2005

On Thursday, November 17, 2005 7:02 AM, rphipps at reliant-solutions.com wrote:

<snip>
>>> I wrote a script yesterday that runs via cron and what it does is it
>>> checks if a certain ic page returns a certain string.  If the page
>>> does not return properly it tries again, up to 5 times.  If after 5
>>> times it doesn't get the page back it restarts Interchange as well
>>> as emails an address to alert that the site restarted.  This way we
>>> see the problem before MaxClients is reached since I have figured
>>> out that MaxClients is not reached until some undetermined amount
>>> of time after the site stops processing requests.
>>>
>> So have you found/confirmed that the client count does keeping
>> ticking upwards once the site stops processing requests?
>
> No, I have not done this, how would I go about finding this out?
>
Well I just use the "top" command and look at the total number of processes. 
I know that if the number of processes is up at around 350 then MaxClients 
has been reached.  Incidentally, I use mod_log_sql to send the Apache log to 
a mysql database, so for every Apache process a MySql process is also 
launched - hence the approx 350 processes in total.

Alternatively just use "ps -elf | grep -c httpd" to see just the number of 
Apache processes.  On my server this is exactly 150 once MaxClients is 
reached (i.e. my MaxClients setting).  Actually, now I come to think of it, 
I think it does fluctuate between say 147 and 150, so perhaps the Apache 
processes are dieing off after all, it's just that Interchange is 
consistently feeding it requests (presumably because the original one 
failed) i.e. a never ending loop.  Yes, actually, this makes a lot of sense 
doesn't it?  To quote Kevin:

> Requests that are waiting in the queue will look as if they are
> hanging.  Stopping and resubmitting the requests will probably
> just make matters worse.
>
So could the situation be that once MaxClients has been reached, Interchange 
starts to get timeouts from Apache and so continually resends requests. 
Actually, the more I think about this the more I realise I could do to 
understand a little bit more about the interaction/communication 
protocol/communication sequence between Apache, mod_interchange and 
Interchange itself.

Kevin (or someone else), would you mind providing a brief overview to the 
communication sequence and "handshaking" between Apache, mod_interchange and 
the Interchange daemon?

> <snip>
>
>> Fantastic, thanks Ron, just implemented it!  It occurs to me that it
>> may be
>> useful to call a few system commands before restarting interchange,
>> and to add the output from these commands to the alert e-mail sent.
>
> No problem, I needed something to keep the site up at night and when I
> was away from my computer, this should help out.
>
>> e.g.
>>
>> #Number of connections to Apache just before Interchange is restarted
>> netstat -nt | grep :80 | wc -l
>> netstat -nt | grep :443 | wc -l
>>
>> #Number of httpd, interchange and mysql processes running just before
>> Interchange is restarted
>> ps -elf | grep -c httpd
>> ps -elf | grep -c interchange
>> ps -elf | grep -c mysqld
>>
>> #Number of connections each IP address has to server just before
>> Interchange
>> is restarted
>> netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c |
>> sort -n
>>
>> Although I could just about work out how to insert these into your
>> script it
>> would involve educated guesswork and no doubt ugly coding, so would
>> be grateful if you could insert these system calls or any other
>> similar or additional commands that you think may provide useful
>> information.
>>
>>> You could modify the checkic.html page to also do db lookups to
>>> verify the db connection is still valid.
>>>
>> If you think this would be useful would you mind also adding this as
>> my Perl
>> isn't up to it.
>>
>> Thanks for you help!
>>
>
> I'll give it a try, I'm interested to see these values when the server
> is going down, although I think these values may be a bit off since
> the server will be down or on it's way down when this is ran, however
> it still may help.
>

######## Snippet
my $browser = LWP::UserAgent->new;
$browser->timeout(30);

my $count = 0;
my $up = 0;

while ($count <= 4) {
        my $response = $browser->get($url);
        if ($response->content =~ m/UP/) {
                $count = 5;
                $up = 1;
        }
        $count++;
}
##########

Ron, does this code mean that it takes your script 2.5 minutes to recognise 
that the server is down (i.e. 5 x 30 seconds timeout?)  If so, would it be 
better if we reduced the timeout to about 5 seconds.  Hopefully 5 tries with 
a 5 second timeout shouldn't cause any false alarms?  That way we will spot 
the server going down quicker and the various system commands I have 
suggested inserting may then give more useful information.  Perhaps we can 
get away with less than a 5 second timeout - what do you think?

BTW, I have thise running via cron at a 1 minute interval - are you doing 
the same?  Thanks

___________________________________________________________ 
Yahoo! Model Search 2005 - Find the next catwalk superstars - http://uk.news.yahoo.com/hot/model-search/