Flat ASCII File Data Access

Brad Brown bradb at ai.toronto.edu
Thu Jan 26 12:22:53 AEST 1989


In article <225800111 at uxe.cso.uiuc.edu> mcdonald at uxe.cso.uiuc.edu writes:
>
>>                                                  ... I have rammed into
>>a wall in trying to access a flat ascii data file with 14,000 records in
>>it.  Naturally, I could read the file one record at a time, but the 
>>end user would probably expire due to old age if I wrote this program
>>in that manner.
>>[...]
>
>I have tried this sort of stuff of MS-DOS, and it doesn't seem to 
>do much good. Has anyone else gotten improvements this way? What
>DOES do some good is to get a good disk cache program.

I have done things like this in MS-DOS and it works *really well*.  I have
a tiny flatfile manager that uses lseek and read to goto and read specific
records, and it works much faster than using streams.  (That is, I use
open() to open a file, *not* fopen().)

If you really have to move through a lot of data, why don't you write your
program so that it reads a large bunch of records (the larger the better)
at once then processes them in memory.  I think this should help you a 
lot because most of your overhead is going to be in waiting for the disk
if you have to do an individual disk read for each record.

Caching may or may not help you, depending on the type of processing you
do.  If you are just making a single pass through the data, a cache will
not make anything go faster than you can do by reading several records
at a time anyway.  That's because you still have to read each record once
and it never gets read again from the cache, so you don't save anything.

If you skip around the database a lot, you might want to think about
writing a record cache into the database part of your program.  A record
cache will have a large pool of record slots and will fill in an empty one
when you read a new record.  If you request a record that was recently
read, it will return a pointer to the record without reading if it's
in the cache.  You should be able to go *even faster* this way compared
to a disk cache of the same size, though writing an efficient cache can
be hairy.

					(-:  Brad Brown  :-)
					bradb at ai.toronto.edu







More information about the Comp.lang.c mailing list