csh pgrp problem

Sun Aug 13 06:50:32 AEST 1989

In article <712 at skye.ed.ac.uk> richard at aiai.UUCP (Richard Tobin) writes:
>Running under SunOS 4, we occasionally encounter an annoying problem:
>a pipeline (eg cat /etc/passwd | more) will stop, with the message
>
>   Stopped (tty output)
>
>I believe I've found the problem, what I want to know is whether there's
>a simple fix, perhaps in more recent versions of csh than we have here.

Unfortunately, the `simple' fix I know of is to continue to use vfork
with csh...

>I had some problems when I first compiled this shell for SunOS 4, and
>the simplest solution seemed to be to #undef VFORK, since fork() in
>SunOs 4 does copy-on-write.

Here's some history on this stuff.  While I was working at Sun, I did
most of the work on the new VM system.  When the new VM project was
started, we believed that we could just have the vfork system call do a
standard copy-on-write fork for binary compatibility.  Then we could
retire vfork from the C library since vfork was a hack marked for
deletion.

When I ran a prototype new VM kernel on my workstation, I occasionally
ran in the "Stopped (tty output)" problem when using csh and pipes.  I
spent some time tracking this mess down.  I found that if I compiled
csh with VFORK not defined, that the csh would occasionally fail the
same way that it did running on the new VM system with vfork replaced
by fork.  From this I concluded that I was seeing the result of a long
time csh bug that was never noticed at Berkeley (where both csh and
vfork originated) since vfork was always used there for csh.  Folklore
has it that vfork was created solely for csh because of the performance
costs of csh doing Unix fork's in a paged environment without
copy-on-write.

After tracking down the race condition in setting the process group
stuff in csh, I decided that it was too hard for me personally to fix
(I was doing kernel VM work, not csh support).  As time went on, we
found more places that depended on the subtle effects of vfork.
Eventually it was decided that SunOS needed to continue to support
vfork even after we had a copy-on-write fork just because of a few
$%$#$!* programs that either took advantage of the vfork semantics
(e.g., csh using vfork to keep exec hash statistics) or accidentally
depended on them (e.g., the csh process group problem when not using
vfork).

>What seems to be happening is that the shell forks twice (once for cat
>and once for more).  Each child sets its process group to the jobid,
>which is cat's process id.  The first child sets the terminal process
>group to the same thing.  However, there's nothing to guarantee that
>the first child sets the terminal process group before the second child
>starts running, and perhaps once in 20 times it doesn't.  In these
>cases the ioctls performed by more cause a SIGTTOU.

Yes - this is problem that I found.  And this is one of the reasons why
SunOS 4.0 csh still uses vfork even though fork now uses copy-on-write.

>Presumably using vfork() forces things to happen in the right order.

Exactly - when using vfork the child process gets to run first and
"borrow the address space" of the parent until the child exec's or
exit's.  After the child exec's or exit's, the parent gets to run after
it gets its address space back from the child process.

I think that the general lesson to be learned here is to not introduce
"temporary hack system calls" because it can be hard to later get rid
of them because some important program(s) either accidentally or
consciencely depending on the (subtle effects of that) hack.

Joseph Moran
Legato Systems Inc.
260 Sheridan Avenue
Palo Alto, CA  94306
(415) 329-7886
mojo at legato.com or {sun,uunet}!legato!mojo