binary data files

Thu May 4 04:38:58 AEST 1989

In article <11021 at bloom-beacon.MIT.EDU> scs at adam.pika.mit.edu (Steve Summit)
writes that assuming you can stat a file for its size breaks down on non-UNIX
systems, and recommends reading into a dynamically grown buffer, which he
grows linearly.

I have often done similar things. The getline() routine in my libbent does a
conceptually similar job for reading arbitrarily long text lines (where you
don't know in advance how many bytes to allocate to last you 'till the next
newline on input). Also, in an image I/O and manipulation library I wrote, I
wanted to be able to read an image from a pipe. I disbelieve in header
parsing, and deduce image dimensions from the file length, so I had to do
roughly the same thing.

However, I am not sure I like the linear reallocation strategy. I would tend
to assume, in general, that realloc would usually be implemented as a series
of malloc/memcpy/free, and thus I try to avoid working it too hard. I found a
binary growth algorithm easy to code, however; basically it looks just like
Steve's linear algorithm except instead of 

	nallocated += 10;

I use 

	nallocated *= 2;

Also, I start with somewhat larger allocations; for getline() I started with
128, and with the image reading facility I start with 65536. Finally, where
you are reading hoping for EOF, by all means issue one big read for reach
realloc, rather than reading along by one or two at a time.

I actually haven't done any performance measurements to determine whether I am
buying any speed with this strategy; however, it isn't much harder to code,
and I am sure on some (if not most) machines the vendor doesn't take enough
care to optimize performance of realloc().

-Bennett
bet at orion.mc.duke.edu