Defunct Processes

Jonathan I. Kamens jik at athena.mit.edu
Thu Feb 14 10:40:53 AEST 1991


Here's something I wrote a while back to deal with the question of reaping
child processes:

  Unfortunately, it's impossible to generalize how the death of child
processes should behave, because the exact mechanism varies over the
various flavors of Unix.  Perhaps someone who's "in the know" (or at
least more so than I am) about POSIX can tell us what the POSIX standard
behavior (if there is any) for this is.

  First of all, by default, you have to do a wait() for child processes
under ALL flavors of Unix.  That is, there is no flavor of Unix that I
know of that will automatically flush child processes that exit, even if
you don't do anything to tell it to do so.

  Second, allegedly, under some SysV-derived systems, if you do
"signal(SIGCHLD, SIG_IGN)", then child processes will be cleaned up
automatically, with no further effort in your part.  However, people
have told me that they've never seen this actually work; the best way to
find out if it works at your site is to try it, although if you are
trying to write portable code, it's a bad idea to rely on this in any case.

  If you can't use SIG_IGN to force automatic clean-up, then you've got
to write a signal handler to do it.  It isn't easy at all to write a
signal handler that does things right on all flavors of Unix, because of
the following inconsistencies:

  On some flavors of Unix, the SIGCHLD signal handler is called if one
*or more* children have died.  This means that if your signal handler
only does one wait() call, then it won't clean up all of the children. 
Fortunately, I believe that all Unix flavors for which this is the case
have available to the programmer the wait3() call, which allows the
WNOHANG option to check whether or not there are any children waiting to
be cleaned up.  Therefore, on any system that has wait3(), your signal
handler should call wait3() over and over again with the WNOHANG option
until there are no children left to clean up.

  On SysV-derived systems, SIGCHLD signals are regenerated if there are
child processes still waiting to be cleaned up after you exit the
SIGCHLD signal handler.  Therefore, it's safe on most SysV systems to
assume when the signal handler gets called that you only have to clean
up one signal, and assume that the handler will get called again if
there are more to clean up after it exits.

  On older systems, signal handlers are automatically reset to SIG_DFL
when the signal handler gets called.  On such systems, you have to put
"signal(SIGCHILD, catcher_func)" (where "catcher_func" is the name of
the handler function) as the first thing in the signal handler, so that
it gets reset.  Unfortunately, there is a race condition which may cause
you to get a SIGCHLD signal and have it ignored between the time your
handler gets called and the time you reset the signal.  Fortunately,
newer implementations of signal() don't reset the handler to SIG_DFL
when the handler function is called.

  The summary of all this is that on systems that have wait3(), you
should use that and your signal handler should loop, and on systems that
don't, you should have one call to wait() per invocation of the signal
handler.  Also, if you want to be 100% safe, the first thing your
handler should do is reset the handler for SIGCHLD, even though it isn't
necessary to do this on most systems nowadays.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik at Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710



More information about the Comp.unix.questions mailing list