SIGCHLD-wait question

Jerre Bowen bowen at sgi.com
Sat Mar 17 11:05:55 AEST 1990


From: bowen at sgi.com (Jerre Bowen)

Folks:

	I'm wondering if there is an easy way in POSIX to be absolutely 
certain that a process which calls a library routine that forks and waits
on a child does not lose any SIGCLDs.  I apologize for the length of this
article.  Here's the scenario:


void cldhandler();

pid_t pid;

main()
{
	sigset_t mtmask;
	struct sigaction action;

	sigemptyset(&mtmask);	/* sigsuspend with no sigs blocked */

	/* SIGCLD handler runs with SIGCLD blocked */
	sigemptyset(&action.sa_mask);
	sigaddset(&action.sa_mask, SIGCLD);
	action.sa_handler = cldhandler;
	action.sa_flags = 0;
	sigaction(SIGCLD, &action,NULL);

	if ( (pid = fork()) == 0) {
		sleep(1);
		exit;
	}
	else {
		forkit();
		sigsuspend(&mtmask);	/* will parent awaken? */
	}
}
	
void
cldhandler(sig)
{
	waitpid(pid, &stat, (WNOHANG|WUNTRACED));
}

forkit()
{
	struct sigaction act, oact;

	act.sa_handler = SIG_DFL;
	act.sa_mask = 0;
	act.sa_flags = 0;
	sigaction(SIGCLD, &act, &oact);	/* default handling for SIGCLD */
	<process forks and execs a program which runs for at least 1 sec>
	<process does a waitpid() on its child process>
	sigaction(SIGCLD, &oact, NULL);	/* reinstall prior handling */
}


	The problem here is that the original child of the parent will 
exit while forkit() is executing, and since SIGCLD is SIG_DFL'ed during
that time, a zombie *will* be created, but the SIGCLD will *not* be delivered.
The parent then suspends waiting for the SIGCLD indicating that
its child exited, which of course never arrives.  (Obviously, I am
primarily concerned about the case where forkit() is a library routine, and
the user has no idea what the routine is doing with signals--and
*shouldn't* need to either.)

	SysV solves this problem in signal() and sigset() by checking for
zombied children at the bottom of the kernel code, and--if any exist--
re-raising a SIGCLD, thus creating the impression that it is impossible to 
lose a SIGCLD.

	BSD requires the user to get around the problem of lost SIGCHLDs
by calling wait3(WNOHANG) until no more children remain to be reaped
whenever one SIGCHLD is received.  But in a BSD version of the above code,
you never get any SIGCHLD, so the parent hangs.

	POSIX has provided waitpid in order to allow library routines
such as system(3) and popen/pclose(3), which need to fork and wait for
child processes, to be implemented reliably even in the case that the
calling program has child processes that may terminate while in the
library routines.  But the above program example shows that a conforming
implementation still does not necessarily allow an application program
to depend on facilities like system(3).  The reason is that POSIX explicitly
leaves undefined the question of whether SIGCHLD is raised when a process
with a terminated child for which it has not waited establishes a handler
for SIGCHLD (see section 3.3.1.3 paragraph 3(e)).  One way in which an
implementation can make the above program work properly is to raise
SIGCHLD in this case (i.e. whenever a process with an outstanding zombie
calls sigaction to set a handler for SIGCHLD).

	Is there a compelling reason for the standard not to require this
behavior?  Granted the implementor has the ability to make things work
correctly.  But if the behavior isn't required, the writer of conforming
applications can't depend on it.

	Is there some other better solution to the problem posed by the sample
program?

		Thanks -- Jerre Bowen  (bowen at sgi.com)

Volume-Number: Volume 18, Number 79



More information about the Comp.std.unix mailing list