File size in bytes (was: Re: binary data files)
Steve Summit
scs at adam.pika.mit.edu
Thu May 4 16:52:20 AEST 1989
In article <14301 at bfmny0.UUCP> tneff at bfmny0.UUCP (Tom Neff) writes:
>MS-DOS has exact filesizes in bytes, and a standard OS call to retrieve
>a file's size in bytes.
So sorry; I was imprecise. Both VMS and MS-DOS let you find out
how many bytes the operating system thinks the file contains.
The trouble is that this number is not equal to the number of
characters that you will read from the file if you do the usual C
text read with single \n's as line terminators.
VMS text files (well, the VMS file format normally used for text
files; VMS has many file formats) have no explicit line termination
(neither CR nor LF); however, attached to each line is a 16-bit
record length, stored in the file and counted against the total
file size. MS-DOS uses the two-character sequence CR-LF as a
line terminator; any reasonable C run-time library translates
each CRLF to a single \n when "text mode" reads are performed.
In either case*, the relation
size = chars + lines
holds, where size is the OS-reported size in bytes, chars is the
number of characters a text-mode C program would read, and lines
is the number of lines (the number of \n's read by a C program)**.
This discrepancy has implications for the tar file example I
mentioned, since tar format uses (and the file size in the tar
header must therefore reflect) single \n's as line terminators.
Steve Summit
scs at adam.pika.mit.edu
* Actually, on VMS, the OS-reported size for variable-length,
carriage control (i.e. standard text) files might be rounded up
to the next multiple of the block size. It's been a while
since I used VMS.
** Modulo files without final newlines, which are rare on MS-DOS
and impossible on VMS.
More information about the Comp.lang.c
mailing list