[ic] Restart and stop problems (multilple processes)

Mike Heins interchange-users@icdevgroup.org
Thu Feb 6 10:17:01 2003


Quoting Daniel Hutchison (jdhutchison11@attbi.com):
> On Wed, 2003-02-05 at 14:41, Mike Heins wrote:
> > Quoting Dorothy Puma (dorothy@digilink.net):
> > > Sach Jobb wrote:
> > > >>Has any one else seen this same behavior with restarting or stopping the
> > > >>interchange process, and have an idea on how to fix it???
> > > > 
> > > > 
> > > > Mine works okay but i'm using OpenBSD.
> > > > 
> > > > However, you should be able to do this manually, no? If it's Slowlaris you
> > > > could do something like 'ps -ef | grep interchange'. Look at what the PID
> > > > is and simply kill it ('kill $PID'). Then just start it normally.
> > > > 
> > > > I suppose you could automate this process using some sort of combination
> > > > involving 'ps' and 'cut' (or just making your own pid file?) but since the
> > > > program 'interchange' is itself simply a perl script perhaps the solution
> > > > likes in hacking it.
> > > 
> > > I have been doing the kill manually, it's just a pain. I've been running 
> > > interchange since the good old minivend days and never had this problem. 
> > >   I would hack the perl script, but wouldn't know where to start :-)
> > > 
> > 
> > I would like to fix this, and I imagine it is simple.
> > 
> > Unfortunately I have no more Solaris machines and have no clients
> > who use them, so I cannot test this. Without some authoritative
> > debug info and without a platform to test on, there is not much I can do.
> 
> FWIW, I did try to look into this issue a little more in depth. However,
> I havn't had a whole lot of time. I havn't found anything definite yet,
> just some suspicions.  
> 
> >From what I can tell, the problem seems to be in the locking of the pid
> file.  Eg. interchange attempts to lock the pid file when it starts up. 
> If it can't lock the pid file, it assumes another interchange process is
> running.  What I suspect is that interchange locks the pid file before
> it forks. Since on solaris, locks created with flock() aren't inherited
> across forks.   As a result, when the parent process exits the pid file
> becomes unlocked.  When interchange is then run with the shutdown
> command it detects that the pid file unlocked and thinks that there
> isn't a running interchange process.
> 
> What I have done is verify that the default install of interchange on my
> solaris box uses the flock() function to lock the pid file.  I've also
> created a mini perl program that just locks files based off the code in
> interchange.  The file locking works fine until I throw a fork() in
> it...
> 
> Anyway, I hope this helps a bit.  

Turns out the files lock fine, but LOCK_NB is not working, at least on the
solaris server I tested on (thanks Dorothy). It doesn't work no matter the
state of fork. In any case, grab_pid happens in the context of the last
fork, as I thought.

I could add a -badlock option at the commandline, but it would seem
to make sense to just fix Perl on the affected systems. 

-- 
Mike Heins
Perusion -- Expert Interchange Consulting    http://www.perusion.com/
phone +1.513.523.7621      <mike@perusion.com>

Just because something is obviously happening doesn't mean something
obvious is happening. --Larry Wall