I/O of complex data structures in C

Chris Torek chris at mimsy.umd.edu
Sun Aug 5 00:27:07 AEST 1990


In article <140087 at sun.Eng.Sun.COM> jasonf at cetemp.Eng.Sun.COM
(Jason Freund) writes:
>Followup-to: s

(There is no such newsgroup.  I put followups back in the groups in
which the original appeared.)

>Basically, a maze is a complex data structure (a 2D array of and array of
>pointers to blah, blah... (it's deep)).  So that means I want to use fread()
>and fwrite() (right?).

Maybe; indeed, even probably:

>When you save data in a database, does the program just go:
>"fwrite(pointer, sizeof, *pointer, items, stream)" which somehow magically
>saves every piece of data (specified in the arguments) in such a way that it
>will be able to read in every piece of data back into their correct cells in
>the data structure?

No.

The primary rule of magic is this: `There is no magic'.  Fread and fwrite
read and write binary data (given a binary stream, as opened via, e.g.,
fopen(name, "wb")) by writing `nitems' objects, each of whose size is
as given, from the location given by the pointer argument.  If each object
is composed of simple bytes (e.g., ASCII or EBCDIC or ISO Latin 1 text),
those bytes will be written directly to the stream.  If each object is
composed of something more complicated, the bytes making up that object
will be written directly to the stream, whether or not that is a sensible
thing to do.

In particular, if the bytes composing the object are in the form of a
pointer, the resulting pointer (when read back via fread) is not
guaranteed to be sensible.  It will have the same bit pattern that the
original pointer had, but that bit pattern may no longer be a valid
pointer value---even if the reading is done by the same run of the
same program (garbage collecting implementations of C are legal, if
rare).  If the reading is done by a different run, or---consider the
effect of fixing a bug in the game---a different but similar program,
the chances are great that the bit pattern will not be useful.

So what can you do?  There are many approaches.  You can assume (as the
Unix `rogue' game does) that different runs of the same program will
be able to make use of the old values, and instead of writing out just
what you need, write out all data.  This approach has its pitfalls,
as anyone who had a winning game of rogue and saved it for the 17th
time will know.  You can convert pointers into indicies (provided that
the pointers point into contiguous memory regions), and write only
the data you need.  You can assume that the values of pointers can be
used to uniquely identify every object, no matter what its type, and
write the data `as is' but convert the pointers when reading back.

We used this latter approach to save arbitrary Lisp data in files for
later recovery, although in this case the saving routine had to worry
about circular data structures and was therefore more complicated---
the output format was a sequence of <id,tag,bytes> triples.  The id was a
unique identifier---probably actually the original pointer value---and
the tag and bytes appeared if and only if the id had not yet appeared
in the save file.  Id 0 was nil.  In this case the tag said what kind
of Lisp object the bytes represented, and if the object had pointers,
e.g., a dotted pair, the <bytes> were themselves <id,tag,bytes>
sequences.  Thus, a list (a b a) was represented as, e.g.,

	id=1, tag=DTPR, bytes=(
	 car: id=2, tag=ATOM, bytes="a";
	 cdr: id=3, tag=DTPR, bytes=(
	  car: id=4, tag=ATOM, bytes="b";
	  cdr: id=5, tag=DTPR, bytes=(
	   car: id=2;
	   cdr: id=0
	  )
	 )
	)

(I simplified this example; the atoms were actually composed of
name, boundp, value-if-bound, property-list sequences.)

This sort of format is pretty much ideal for portability (particularly
if you record numeric values in printed form).  The major disadvantage
is that producing and reading such files tends to be slow.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at cs.umd.edu	Path:	uunet!mimsy!chris
	(New campus phone system, active sometime soon: +1 301 405 2750)



More information about the Comp.lang.c mailing list