stdio EOF

Sun Sep 11 20:09:13 AEST 1988

In article <13427 at mimsy.UUCP>, chris at mimsy.UUCP (Chris Torek) writes:
> In article <8422 at smoke.ARPA> gwyn at smoke.ARPA (Doug Gwyn ) writes:
>> ... In fact [stdio] EOF should not be "sticky"; if more data becomes
>> available, as on a terminal, it should be available for subsequent
>> reading.  The 4.2BSD implementation broke this but it might be okay
>> on 4.3BSD.
> 
> I thought this behaviour was added to 4.2BSD to conform to some
> existing standard.

Berkeley conform to an existing standard?  You must be kidding.

The story I read on the net a few years ago is that Berkeley made this
change to fix a problem with fread.  The problem is that the fread
documentation contradicts itself, stating both that, "fread returns
the number of items actually read," and "fread returns 0 on end of
file or error."  What should fread do when its caller requests three
items, but fread encounters and end of file after reading only two?
The first sentence claims it should return two (the number of items
read), while the second claims it should return zero (because end of
file was encountered).

Berkeley interpreted the documentation as indicating that fread should
return two, but should then return zero on the next call.  The obvious
way to implement this would be to have fread do an ungetc on the EOF
so that the next time it was called it would immediately read an EOF
and return zero.  However, ungetc does not allow an EOF to be pushed
back onto the input.  This deficiency of ungetc is (in my view) the
biggest flaw in the design of the stdio library, and it makes it
impossible to implement scanf correctly, so Berkeley would have done
the world a favor by extending the stdio library to allow EOF to be
pushed back.

Instead, they chose a simpler approach:  make getc always return EOF
when the eof or error flags are set.  This approach allowed them to
fix the fread problem by writing only a couple of lines of code, but
it also broke getc.  In 4.2 BSD the behavior of getc is a bug since it
disagrees with the documentation.  In 4.3 BSD, Berkeley modified the
documentation to agree with the code.  ("It's not a bug, it's a feature!")

By the way, AT&T also noticed the contradiction in the fread documentation.
They fixed the documentation so that it clearly reflected the behavior
of the code.  This seems like a better approach since modifying the code
to agree with the documentation doesn't make much sense when the meaning
of the documentation is so unclear.  In any case, AT&T's approach, unlike
Berkeley's, didn't break working code.

> What does the dpANS say?  POSIX?

I don't know, and how they resolve this issue is less important than
that the issue is resolved.  The standard I/O library is supposed to be
*standard*; that's the whole point of it.  There are, however, several
reasons why they should prefer Dennis Ritchie's original definition of
getc over Berkeley's:

1.  Ritchie's definition has seniority.  Berkeley's gratuitous change to
    getc was not made until 4.2 BSD and was not documented until 4.3 BSD.
    All other versions of UN*X use Ritchie's definition.

2.  Aesthetics.  Ritchie's definition can be stated in seven words:  Return
    EOF when at end of file.

3.  Authority.  If anyone's opinion should be respected when setting UN*X
    standards, Ritchie's should be.
					Kenneth Almquist

-- 
And there shall come among you false prophets, who will corrupt my
teachings and teach that EOF should be sticky....