bug in pclose(3)

Tue Dec 20 13:48:43 AEST 1988

I just tripped over a bug in the pclose library routine.  If you do a
pclose on a file descriptor (that was opened with popen of course) in
some circumstances it will hang forever.  For example:

    pin = popen( "comm1", "r" );
    pout1 = popen( "comm2", "w" );
    pout2 = popen( "comm3", "w" );

    while( <read pin until eof> )
	<write portions to pout1 and pout2>

    pclose( pout1 );
    pclose( pin );		/* this one hangs */
    pclose( pout2 );

If the pclose( pin ) is switched with either of the other two
pcloses, then there is no hang - if it is first, then all works
perfectly, if it is last, then it fails with the error "no child".

It is clear what is happening - the pclose( pout1 ) closes its
file descriptor and then does a wait to get the status of the
pipe child; but it first gets (and discards) the status of the
pin pipe child.  When the pclose( pin ) is done, it closes the
file descriptor, and then it tries to wait for the child.  But
the child's status has already been discarded, and a different
child still exists, so the system can't return an error, so the
wait just hangs - since the pout2 child won't terminate until it
get eof, and that won't happen until this process closes the file
descriptor.

There are many other variations possible.  (I first encountered
this in a perl program with only two pipes, but there were system(3)
calls done in the body of the loop, which presumably were doing
their own wait.)  I ended up with the above version when I tried
to reduce the problem to determine whether it was Perl's fault.

I have isolated this behaviour on two different systems - Interactive
Systems System 5/386 (I should know the release number, ask me tomorrow)
and Xenix System III for 68000.

(Finally the question...)

Is this behaviour common to all System 5 variations?  To BSD
derivatives?  SunOS?  AIX?  Your favourite here?

Is there even a good general solution?  I can see only one good way
to handle all of the variations of some routine wanting to wait for
a specific child and getting the termination info for a different child
instead (which will eventually be waited for - perhaps by a totally
different routine).  That would be to provide some new library routines:
waitfor( child, &status ) and postwait( child, status ).  Waitfor would
wait for a specified child (and save information internally on any other
children that terminate in the meantime).  Postwait would allow a routine
that had done a wait call and gotten the termination for a child that
it didn't know to pass that info into the mechanism for saving used by
waitfor.  These routines could be used internally by pclose, system, and
any other library routines that have waiting for a specific child as a
part of their semantics, as well as being provided to the user as a new
pair of library routines for building additional capabilities that include
forked children as a part of their implementation.
-- 
--
John Macdonald