Fix to sed (what's a text file?)

Doug Gwyn <gwyn> gwyn at brl-tgr.ARPA
Tue Oct 22 10:01:24 AEST 1985


> >Many UNIX text-file utilities will discard a (necessarily final)
> >text line that does not end in a newline.  Quite simply, such a
> >file is not a proper UNIX text file.
> 
> Who says?  Where's the definition of a 'proper' UNIX text file?

The problem is, there are several interpretations of such a file,
depending on the utility involved.  Perhaps there should be a
well-defined standard interpretation, but there isn't currently.

"A file of text consists simply of a string of characters, with
lines demarcated by the newline character."  -- from "The UNIX
Time-Sharing System" by Ritchie & Thompson

"text file, ASCII file -- a file, the bytes of which are understood
to be in ASCII code"  -- from "Glossary" in "UNIX Time-Sharing
System Programmer's Manual", 8th Ed.

"A text stream is an ordered sequence of bytes composed into lines,
each line consisting of zero or more characters plus a terminating
new-line character.  ...  The sequentially last character read in
from a text stream will, however, always be sequentially the last
character that was earlier written out to the text stream, if that
character was a new-line."  -- from ANSI X3J11/85-045

My personal choice would be similar to Ritchie & Thompson, where
newlines delimit (NOT "terminate") text lines, so that the last
character in a text file would not need to be a newline.  However,
this raises the question of what utilities should do with the
null line at the end of every text file that DOES end with a
newline; this will still be utility-dependent (and should be
documented whenever it is handled differently from other text
lines in the file).

X3J11/85-045 botched it anyhow, since they intended that ALL UNIX
files qualify as "text streams" under stdio (vs. "binary streams",
which have to be handled differently on some non-UNIX OSes).

So, how do we establish a standard interpretation for non-newline-
terminated UNIX text files?

(Discussion should move to net.unix.)



More information about the Net.bugs mailing list