Sys V fork IS broken! No it isn't! Is too! Is not!

Wed Aug 1 13:37:05 AEST 1990

gwyn at smoke.BRL.MIL (Doug Gwyn) writes:

Doug Gwyn says
>The bug is that your application makes no attempt to recover from a known
>class of error, EAGAIN in this case.

And (grumble grumble) he is right.  Touche'.  IMHO this error should not
happen, but if it's documented, applications have to deal with it.

fork() is still broken.  This failure is fundamentally unlike many of the
others (write(2), wait(2)) that have been raised.  The key difference is
determinism.  This failure not only is transient and non-deterministic, there
is no deterministic step an application can take to fix (or even diagnose) the
problem.  Therefore, the application does not have sufficient info upon which
to base a policy decision.  Therefore the minimal-policy arguments are flawed.

rice at dg-rtp.dg.com (Brian Rice) says:

>But I do think there's something to be said in defense of traditional fork.

I agree.  In my view, traditional fork either works or fails in a
deterministic fashion.  But whaddo I know, I went straight from V7 to bsd...

>Well, maybe the kernel could queue each fork request that it was unable to
>complete and then satisfy each request in order...or maybe it could satisfy
>the smallest request first, with some kind of aging mechanism to keep from
>starving forks of big processes, etc., etc....this would get complicated,
>clearly, and might even require so much overhead as to provoke thrashing.  But
>maybe you could do it.

Alternatively, a virtual memory system could have been designed that does not
exhibit this pattern of transient non-deterministic failures.  There are a
substantial number of existence proofs that this is possible.

boyd at necisa.ho.necisa.oz (Boyd Roberts) says:

>When fork() fails with EAGAIN it fails for a good reason.

That's a controversial assertion.  I, for one, am highly unconvinced.

>It would seem that there is some consensus to change the semantics of
>fork() to retry.  This would break a critical interface.  System calls
>do one thing, and one thing well.

System calls *should* do one thing well.  I am arguing precisely that fork(),
in forcing user-mode code to deal with transient conditions beyond any hope of
diagnosis or control from user mode, is failing to do one thing well.  In
fact, instead of doing 

fork_if_possible (the _if_possible is of course implicit on all system calls),
it is doing
fork_if_possible_and_the_current_transient_system_state_makes_it_straightforward_within_the_architecture_of_this_vm_implementation

Which smells like a violation of the whole unix approach.  The "critical
interface" is already broken; it should be fixed.

I think the proposal, raised by some, of putting a fork_with_backoff() in
libc, has merit.  Let's call it unbroken_fork().

Cheers, Tim Bray, Open Text Systems, Waterloo, Ont.