Unix/Xenix Software to make an Index

Dr. Robin Lake rbl at nitrex.UUCP
Thu Nov 20 06:17:29 AEST 1986


There have been several requests regarding automatic generation of indices
from text.  I did it once, about 10 years ago, when UNIX was younger and more
forgiving (as was I!).  The logic of the program goes as follows:

In the early (V6)
days of UNIX, the tutorial on C included an example program called "tree".
It built a binary tree of words, counted the occurrance of each word and
then (at EOF on stdin) printed an alphabetical list of words and their
occurrances.  A "straightforward" modification of this program 
involves changing the data structure of each tree node to allow a list of
page numbers in place of the integer count of occurrances.  As the incoming
text is scanned, pick up the current page number.  As a word occurs, enter it
into the binary tree (if it's new) and add the page number to that word's
page number list.  At the EOF, traverse the tree, print the words and their 
associated page numbers.  Voila!  An index!

The problem came when we ran out of memory on the PDP-11/45 we used then (at
a very different institution).  We never took the time to work out the problem
of storing (sub)-trees onto disk files and then combining them at the EOF.
(Sounds like a great homework assignment for a Data Structures course!).

Source (highly commented) to tree.c  available via e-mail on request.
If enough (N > ?) requests come in, I'll post it.
Sorry, but the indexing version is not available, having gone to bit heaven
years ago.

Questions to:
Robin Lake
Standard Oil R&D
(216)-581-5976
cbatt!nitrex!rbl
dexvax!cwruecmp!nitrex!rbl



More information about the Comp.unix.questions mailing list