v08i076: Soundex spelling-checker, Part01/02

sources-request at mirror.UUCP sources-request at mirror.UUCP
Sat Feb 21 06:56:16 AEST 1987


Submitted by: Barry Brachman <brachman at cs.ubc.cdn>
Mod.sources: Volume 8, Issue 76
Archive-name: sp/Part01


Sp is a soundex-based dictionary search program that uses the
dbm routines.  Included are Makefiles, man pages, and an Mlisp interface.

Barry Brachman
Dept. of Computer Science
Univ. of British Columbia
Vancouver, B.C. V6T 1W5

.. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
brachman at cs.ubc.cdn
brachman%ubc.csnet at csnet-relay.arpa
brachman at ubc.csnet

#! /bin/sh
# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
# If all goes well, you will see the message "End of archive 1 (of 2)."
# Contents:  README Makefile Makefile.newdbm calcsoundex.c dbm.bug
#   dbm.diffs dbmstuff.c misc.c mksp.c sp.1 MANIFEST
PATH=/bin:/usr/bin:/usr/ucb; export PATH
echo shar: extracting "'README'" '(5017 characters)'
if test -f 'README' ; then 
  echo shar: will not over-write existing file "'README'"
else
sed 's/^X//' >README <<'@//E*O*F README//'
X
XHere are a pair of programs that might be of some use to those who have
Xtrouble with spelling.
X
XThe first program, sp, accepts your tentative or approximate
Xspelling of a word as input and produces a list of words.
XIf the correct spelling of the word appears in one of the dictionaries used,
Xit is likely that it appears in the output list.
XNote that this is different from the UNIX 'spell' command that
Xtells you which words in a document do not appear in the dictionary.
X
XThe second program, mksp, lets you maintain your own dictionary of troublesome
Xwords.
X
X=====
XTo run sp you'll need:
X	- the Unix dbm routines, old or new (4.3BSD)
X
XNot required, but very useful:
X	- the source to the old dbm routines if you don't have the new ones
X	  or your dbm routines don't have dbmclose() (check your man page for
X	  dbm(3X) to see if you've got dbmclose())
X	- /usr/dict/words plus any other large list of words you might have
X
X=====
XI apologize for the complexity of the following guide.  It is due to the
Xpossibility of 4 different dbm configurations:  4.3 style dbm, Sun style dbm
Xwith the dbmclose() routine, "old" (4.2BSD/V7) dbm with source and without
Xsource.
X
X1. The program assumes that a char is 8 bits and an int is at least 16 bits.
X   I've avoided using shorts.
X
X2. Note the following if you are using the old dbm routines that *don't* have
X   dbmclose():
X   The "old" dbm routines that don't have dbmclose() don't work properly if you
X   do more than one dbminit().  If you have source code, you can apply the
X   diffs so that multiple dbminit() calls will work, allowing
X   multiple dictionaries to be used by sp, although you can still only access
X   one dbm at a time.  If you do not have source then you can still use
X   sp/mksp except you must change MAXDICT (in sp.h) to 1 and edit
X   Makefile.newdbm as indicated there.  You will only be able to use one
X   dictionary.  I'm including a bug report that came off the net for the old
X   dbm routines.  This bug patch has been included in dbm.diffs but is
X   surrounded by #ifdef BUGFIX.
X
X   If you're applying the patches to the old dbm code, make a copy of dbm.c
X   and dbm.h.  Apply the patches by:
X	patch < dbm.diffs
X   or by hand (Larry Wall's patch program is in the mod.sources archive).
X
X3. Note the following if you are using the old dbm routines that *do* have
X   dbmclose() (e.g., Sun 2 and Sun 3):
X   Edit Makefile.newdbm and uncomment the two lines indicated.  Make using
X   Makefile.newdbm (see below).
X
X4. Check sp.h and adjust for local conditions.  You might also edit sp.1
X   to reflect your local configuration.
X
X5. I've tried to make it easy to change the key used for retrieving from
X   the dbm.  The routines to make and disassemble a key are in misc.c.
X   I want to keep the key as small as possible since dictionaries tend to
X   be rather large.  I've used a vector of unsigned chars for the key because
X   I didn't want to have to deal with various lengths of shorts and ints on
X   different hardware.
X
X6. If you are using the "new" dbm routines (e.g., those in 4.3BSD that allow
X   multiple simultaneously open dbm's), if you have dbmclose(), or if you have
X   the old dbm routines without the dbm source then do:
X	make -f Makefile.newdbm
X   otherwise do:
X   	make
X
X   Then move sp, mksp, and calcsoundex to a public directory.  Copy sp.1 to
X   where you keep man pages for such programs (you might also link mksp.1 and
X   calcsoundex.1 to sp.1).
X
X7. If you are using Gosling EMACS, copy sp.ml (the MLISP interface to sp) into
X   a public EMACS library.  I haven't tried to convert sp.ml to work with
X   gnuemacs.  Put the documentation (sp.9) where appropriate on your system
X   (you may need to edit the FILES section).
X
X8. You should create a public library using /usr/dict/words, e.g.:
X	mksp -a -v /usr/public/lib/sp.dict < /usr/dict/words
X   The path of this dictionary should appear in DEFAULT_SPPATH (sp.h). Users
X   should be made aware of the public version so they don't make their own copy.
X
X9. dbm doesn't seem to work between a Sun and VAX across NFS.  Too bad.
X   (It does work between Sun's.)
X   Use rsh with the dictionary list on the command line.
X
X10. The programs have been tested on Sun 3/160 (4.2BSD 3.0), VAX 750 (4.3BSD),
X    using both the new and old dbm routines.
X
X11. I have a dictionary of 35K words (350Kb) that do not appear in
X    /usr/dict/words.  The only way I have of circulating it is on a
X    double-sided Atari ST or Mac disk (single-sided if ARC'ed).  If you are
X    interested send me a message.  Perhaps it could be archived somewhere
X    (any volunteers?).
X
X12. Reference: Knuth, D.E. The Art of Computer Programming, Volume 3/Sorting
X    and Searching, 1973, pp.391-392.
X
X13. If you find any bugs please notify me rather than posting to the net.
X
XEnjoy!
X
X-----
XBarry Brachman
XDept. of Computer Science
XUniv. of British Columbia
XVancouver, B.C. V6T 1W5
X
X.. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
Xbrachman at cs.ubc.cdn
Xbrachman%ubc.csnet at csnet-relay.arpa
Xbrachman at ubc.csnet
X
@//E*O*F README//
if test 5017 -ne "`wc -c <'README'`"; then
    echo shar: error transmitting "'README'" '(should have been 5017 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'Makefile'" '(482 characters)'
if test -f 'Makefile' ; then 
  echo shar: will not over-write existing file "'Makefile'"
else
sed 's/^X//' >Makefile <<'@//E*O*F Makefile//'
X
X# Makefile for systems using modified dbm routines
X
XCFLAGS=-O
X
Xall:	sp mksp
X
Xsp:	sp.o dbmstuff.o dbm.o misc.o
X	cc ${CFLAGS} -o sp sp.o dbmstuff.o misc.o dbm.o
X
Xmksp:	mksp.o dbmstuff.o dbm.o misc.o
X	cc ${CFLAGS} -o mksp mksp.o dbmstuff.o misc.o dbm.o
X
Xcalcsoundex:	calcsoundex.c
X	cc ${CFLAGS} -o calcsoundex calcsoundex.c
X
X# define BUGFIX if you want the fix included
Xdbm.o:	dbm.h dbm.c
X	cc -c ${CFLAGS} dbm.c
X
X.c.o:
X	cc -c ${CFLAGS} $?
X
Xclean:
X	rm -f sp.o mksp.o dbmstuff.o dbm.o
X
@//E*O*F Makefile//
if test 482 -ne "`wc -c <'Makefile'`"; then
    echo shar: error transmitting "'Makefile'" '(should have been 482 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'Makefile.newdbm'" '(850 characters)'
if test -f 'Makefile.newdbm' ; then 
  echo shar: will not over-write existing file "'Makefile.newdbm'"
else
sed 's/^X//' >Makefile.newdbm <<'@//E*O*F Makefile.newdbm//'
X
X# Makefile for systems with the new dbm routines (e.g., 4.3BSD systems),
X# those with a dbm library containing dbmclose() (e.g., Sun 2 and Sun 3),
X# and those with the old dbm without source
X
XCFLAGS=-O -DNEWDBM
XLIB=
X
X# If you are using the old dbm routines with dbmclose(), uncomment
X# the following two lines (otherwise comment them out)
X# CFLAGS=-O -DHAS_CLOSE
X# LIB=-ldbm
X
X# If you are using the old dbm routines without dbmclose(), uncomment
X# the following two lines (otherwise comment them out)
X# CFLAGS=-O
X# LIB=-ldbm
X
Xall:	sp mksp
X
Xsp:	sp.o dbmstuff.o misc.o sp.h
X	cc ${CFLAGS} -o sp sp.o dbmstuff.o misc.o ${LIB}
X
Xmksp:	mksp.o dbmstuff.o sp.h
X	cc ${CFLAGS} -o mksp mksp.o dbmstuff.o misc.o ${LIB}
X
Xcalcsoundex:	calcsoundex.c
X	cc ${CFLAGS} -o calcsoundex calcsoundex.c
X
X.c.o:
X	cc -c ${CFLAGS} $?
X
Xclean:
X	rm -f sp.o mksp.o dbmstuff.o dbm.o
X
@//E*O*F Makefile.newdbm//
if test 850 -ne "`wc -c <'Makefile.newdbm'`"; then
    echo shar: error transmitting "'Makefile.newdbm'" '(should have been 850 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'calcsoundex.c'" '(2455 characters)'
if test -f 'calcsoundex.c' ; then 
  echo shar: will not over-write existing file "'calcsoundex.c'"
else
sed 's/^X//' >calcsoundex.c <<'@//E*O*F calcsoundex.c//'
X/* vi: set tabstop=4 : */
X
X/*
X * calcsoundex - calculate soundex codes
X *
X * Permission is given to copy or distribute this program provided you
X * do not remove this header or make money off of the program.
X *
X * Please send comments and suggestions to:
X * Barry Brachman
X * Dept. of Computer Science
X * Univ. of British Columbia
X * Vancouver, B.C. V6T 1W5
X *
X * .. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
X * brachman at cs.ubc.cdn
X * brachman%ubc.csnet at csnet-relay.arpa
X * brachman at ubc.csnet
X */
X
X#include <stdio.h>
X#include <ctype.h>
X
X#include "sp.h"
X
Xchar word[MAXWORDLEN + 2];
X
Xchar soundex_code_map[26] = {
X/***	 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P	***/ 
X		 0, 1, 2, 3, 0, 1, 2, 0, 0, 2, 2, 4, 5, 5, 0, 1,
X
X/***	 Q  R  S  T  U  V  W  X  Y  Z			***/
X		 2, 6, 2, 3, 0, 1, 0, 2, 0, 2
X};
X
Xmain(argc, argv)
Xint argc;
Xchar **argv;
X{
X	register int c, i, soundex_length, digit_part, previous_code;
X	int ch, len, vflag;
X	short soundex;
X	char *gets();
X
X	vflag = 0;
X	if (argc > 2 || (argc == 2 && strcmp(argv[1], "-v"))) {
X		fprintf(stderr, "Usage: calcsoundex [-v]\n");
X		exit(1);
X	}
X	if (argc > 1)
X		vflag = 1;
X
X	while (fgets(word, sizeof(word), stdin) != (char *) NULL) {
X		len = strlen(word);
X		if (word[len - 1] != '\n') {
X			fprintf(stderr, "calcsoundex: Word too long: %s", word);
X			while ((ch = getchar()) != '\n')	/* flush rest of line */
X				putc(ch, stderr);
X			putc('\n', stderr);
X			continue;
X		}
X		word[--len] = '\0';
X		if (len > MAXWORDLEN) {
X			fprintf(stderr, "calcsoundex: Word too long: %s\n", word);
X			continue;
X		}
X
X		for (i = 0; word[i] != '\0'; i++) {
X			if (isupper(word[i]))
X				word[i] = tolower(word[i]);
X		}
X		if (!isalpha(word[0]))
X			continue;
X
X		digit_part = 0;
X		soundex_length = 0;
X		previous_code = soundex_code_map[word[0] - 'a'];
X		for (i = 1; word[i] != '\0' && soundex_length < 3; i++) {
X			if (!isalpha(word[i]))
X				continue;
X			c = soundex_code_map[word[i] - 'a'];
X			if (c == 0 || previous_code == c) {
X				previous_code = c;
X				continue;
X			}
X			digit_part = digit_part * 10 + c;
X			previous_code = c;
X			soundex_length++;
X		}
X		while (soundex_length++ < 3)
X			digit_part *= 10;
X		soundex = digit_part << 5 + word[0] - 'a';
X		printf("%c", word[0]);
X		if (digit_part < 100)
X			putchar('0');
X		if (digit_part < 10)
X			putchar('0');
X		if (digit_part == 0)
X			putchar('0');
X		else
X			printf("%d", digit_part);
X		if (vflag)
X			printf(" %s", word);
X		putchar('\n');
X	}
X	putchar('\n');
X	exit(0);
X}
X
@//E*O*F calcsoundex.c//
if test 2455 -ne "`wc -c <'calcsoundex.c'`"; then
    echo shar: error transmitting "'calcsoundex.c'" '(should have been 2455 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'dbm.bug'" '(1839 characters)'
if test -f 'dbm.bug' ; then 
  echo shar: will not over-write existing file "'dbm.bug'"
else
sed 's/^X//' >dbm.bug <<'@//E*O*F dbm.bug//'
XArticle 770 of net.bugs.4bsd:
XPath: ubc-cs!ubc-ean!alberta!ihnp4!mhuxn!mhuxr!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!seismo!elsie!ado
XFrom: ado at elsie.UUCP (Arthur David Olson)
XSubject: 4.?bsd dbm's store(k,c) dies if (i=k.dsize+c.dsize)==1018||i==1019--FIX
XDate: Wed, 10-Apr-85 09:02:36 PST
XDate-Received: Thu, 11-Apr-85 13:03:49 PST
XOrganization: NIH-LEC, Bethesda, MD
X
XIndex:		lib/libdbm/dbm.c Fix
X
XDescription:
X	4.?bsd dbm's store function misbehaves if the sum of the key data
X	size and content data size is either 1018 or 1019.
X
XRepeat-By:
X	Compile this program with the "dbm" library:
X
X		typedef struct {
X			char *	dptr;
X			int	dsize;
X		} datum;
X
X		char	buf[1024];
X
X		main(argc, argv)
X		int	argc;
X		char *	argv[];
X		{
X			int	result;
X			datum	key;
X			datum	content;
X
X			key.dptr = content.dptr = buf;
X			key.dsize = atoi(argv[1]);
X			content.dsize = 0;
X			creat("fake.dir", 0600);
X			creat("fake.pag", 0600);
X			dbminit("fake");
X			result = store(key, content);
X			printf("%d\n", result);
X		}
X
X	Then run the program.  If you use commands such as
X		a.out 0
X		a.out 1
X		...
X		a.out 1017
X	things go swimmingly.  If you use commands such as
X		a.out 1019
X		a.out 1020
X		...
X	an error message is (correctly) produced.  But if you use either
X	the command
X		a.out 1018
X	or
X		a.out 1019
X	things go wild.
X
XFix:
X	As usual, the trade secret status of the code involved precludes a
X	clearer posting.  The fix is to change one line in "dbm.c"; it
X	causes an error message to be produced in the 1018/1019 cases:
X
X		#ifdef OLDVERSION
X			if(key.dsize+dat.dsize+2*sizeof(short) >= PBLKSIZ) {
X		#else
X			if(key.dsize+dat.dsize+3*sizeof(short) >= PBLKSIZ) {
X		#endif
X--
XBugs is a Warner Brothers trademark
X--
X	UUCP: ..decvax!seismo!elsie!ado    ARPA: elsie!ado at seismo.ARPA
X	DEC, VAX and Elsie are Digital Equipment and Borden trademarks
X
X
@//E*O*F dbm.bug//
if test 1839 -ne "`wc -c <'dbm.bug'`"; then
    echo shar: error transmitting "'dbm.bug'" '(should have been 1839 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'dbm.diffs'" '(2748 characters)'
if test -f 'dbm.diffs' ; then 
  echo shar: will not over-write existing file "'dbm.diffs'"
else
sed 's/^X//' >dbm.diffs <<'@//E*O*F dbm.diffs//'
XIndex: dbm.c
X*** dbm.orig.c	Thu Nov 27 17:45:38 1986
X--- dbm.c	Thu Nov 27 18:13:11 1986
X***************
X*** 6,16 ****
X--- 6,21 ----
X  #include	<sys/types.h>
X  #include	<sys/stat.h>
X  
X+ static long dbm_access_oldb;
X+ static getbit_oldb;
X+ 
X  dbminit(file)
X  char *file;
X  {
X  	struct stat statb;
X  
X+ 	dbm_access_oldb = -1;
X+ 	getbit_oldb = -1;
X  	dbrdonly = 0;
X  	strcpy(pagbuf, file);
X  	strcat(pagbuf, ".pag");
X***************
X*** 27,36 ****
X  		dirf = open(pagbuf, 0);
X  		dbrdonly = 1;
X  	}
X! 	if(pagf < 0 || dirf < 0) {
X! 		printf("cannot open database %s\n", file);
X  		return(-1);
X- 	}
X  	fstat(dirf, &statb);
X  	maxbno = statb.st_size*BYTESIZ-1;
X  	return(0);
X--- 32,39 ----
X  		dirf = open(pagbuf, 0);
X  		dbrdonly = 1;
X  	}
X! 	if(pagf < 0 || dirf < 0)
X  		return(-1);
X  	fstat(dirf, &statb);
X  	maxbno = statb.st_size*BYTESIZ-1;
X  	return(0);
X***************
X*** 130,136 ****
X--- 133,143 ----
X  	return (0);
X  
X  split:
X+ #ifdef BUGFIX
X+ 	if(key.dsize+dat.dsize+3*sizeof(short) >= PBLKSIZ) {
X+ #else
X  	if(key.dsize+dat.dsize+2*sizeof(short) >= PBLKSIZ) {
X+ #endif
X  		printf("entry too big\n");
X  		return (-1);
X  	}
X***************
X*** 226,232 ****
X  dbm_access(hash)
X  long hash;
X  {
X! 	static long oldb = -1;
X  
X  	for(hmask=0;; hmask=(hmask<<1)+1) {
X  		blkno = hash & hmask;
X--- 233,239 ----
X  dbm_access(hash)
X  long hash;
X  {
X! /***	static long oldb = -1;	***/
X  
X  	for(hmask=0;; hmask=(hmask<<1)+1) {
X  		blkno = hash & hmask;
X***************
X*** 234,245 ****
X  		if(getbit() == 0)
X  			break;
X  	}
X! 	if(blkno != oldb) {
X  		clrbuf(pagbuf, PBLKSIZ);
X  		lseek(pagf, blkno*PBLKSIZ, 0);
X  		read(pagf, pagbuf, PBLKSIZ);
X  		chkblk(pagbuf);
X! 		oldb = blkno;
X  	}
X  }
X  
X--- 241,252 ----
X  		if(getbit() == 0)
X  			break;
X  	}
X! 	if(blkno != dbm_access_oldb) {
X  		clrbuf(pagbuf, PBLKSIZ);
X  		lseek(pagf, blkno*PBLKSIZ, 0);
X  		read(pagf, pagbuf, PBLKSIZ);
X  		chkblk(pagbuf);
X! 		dbm_access_oldb = blkno;
X  	}
X  }
X  
X***************
X*** 247,253 ****
X  {
X  	long bn;
X  	register b, i, n;
X! 	static oldb = -1;
X  
X  	if(bitno > maxbno)
X  		return(0);
X--- 254,260 ----
X  {
X  	long bn;
X  	register b, i, n;
X! /***	static oldb = -1;  ***/
X  
X  	if(bitno > maxbno)
X  		return(0);
X***************
X*** 255,265 ****
X  	bn = bitno / BYTESIZ;
X  	i = bn % DBLKSIZ;
X  	b = bn / DBLKSIZ;
X! 	if(b != oldb) {
X  		clrbuf(dirbuf, DBLKSIZ);
X  		lseek(dirf, (long)b*DBLKSIZ, 0);
X  		read(dirf, dirbuf, DBLKSIZ);
X! 		oldb = b;
X  	}
X  	if(dirbuf[i] & (1<<n))
X  		return(1);
X--- 262,272 ----
X  	bn = bitno / BYTESIZ;
X  	i = bn % DBLKSIZ;
X  	b = bn / DBLKSIZ;
X! 	if(b != getbit_oldb) {
X  		clrbuf(dirbuf, DBLKSIZ);
X  		lseek(dirf, (long)b*DBLKSIZ, 0);
X  		read(dirf, dirbuf, DBLKSIZ);
X! 		getbit_oldb = b;
X  	}
X  	if(dirbuf[i] & (1<<n))
X  		return(1);
@//E*O*F dbm.diffs//
if test 2748 -ne "`wc -c <'dbm.diffs'`"; then
    echo shar: error transmitting "'dbm.diffs'" '(should have been 2748 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'dbmstuff.c'" '(1595 characters)'
if test -f 'dbmstuff.c' ; then 
  echo shar: will not over-write existing file "'dbmstuff.c'"
else
sed 's/^X//' >dbmstuff.c <<'@//E*O*F dbmstuff.c//'
X/* dbmstuff.c */
X
X/*
X * Interface to old and new dbm routines
X */
X
X#include <stdio.h>
X
X#ifndef NEWDBM
X
X#include <dbm.h>
X
X/*ARGSUSED*/
XDBMINIT(path, flags)
Xchar *path;
Xint flags;
X{
X
X	return(dbminit(path));
X}
X
XDBMCLOSE()
X{
X
X#ifdef HAS_CLOSE
X	dbmclose();
X#else
X	close(3);		/* free up the file descriptors */
X	close(4);
X#endif
X}
X
Xdatum
XFETCH(key)
Xdatum key;
X{
X	datum fetch();
X
X	return(fetch(key));
X}
X
Xdatum
XFIRSTKEY()
X{
X
X	return(firstkey());
X}
X
Xdatum
XNEXTKEY(key)
Xdatum key;
X{
X	return(nextkey(key));
X}
X
XSTORE(key, content)
Xdatum key, content;
X{
X
X	return(store(key, content));
X}
X
XREPLACE(key, content)
Xdatum key, content;
X{
X
X	if (delete(key) == -1)
X		return(-1);
X	return(store(key, content));
X}
X
XDELETE(key)
Xdatum key;
X{
X
X	return(delete(key));
X}
X
X#endif !NEWDBM
X
X#ifdef NEWDBM
X
X#include <ndbm.h>
X
Xstatic DBM *current_db = (DBM *) NULL;
X
XDBMINIT(path, flags)
Xchar *path;
Xint flags;
X{
X
X	current_db = dbm_open(path, flags, 0);
X	if (current_db == (DBM *) NULL)
X		return(-1);
X	return(0);
X}
X
XDBMCLOSE()
X{
X
X	if (current_db != (DBM *) NULL) {
X		dbm_close(current_db);
X		current_db = (DBM *) NULL;
X	}
X}
X
Xdatum
XFETCH(key)
Xdatum key;
X{
X
X	return(dbm_fetch(current_db, key));
X}
X
Xdatum
XFIRSTKEY()
X{
X
X	return(dbm_firstkey(current_db));
X}
X
X/*ARGSUSED*/
Xdatum
XNEXTKEY(key)
Xdatum key;
X{
X
X	return(dbm_nextkey(current_db));
X}
X
XREPLACE(key, content)
Xdatum key, content;
X{
X
X	return(dbm_store(current_db, key, content, DBM_REPLACE)); 
X}
X
XSTORE(key, content)
Xdatum key, content;
X{
X
X	return(dbm_store(current_db, key, content, DBM_INSERT));
X}
X
XDELETE(key)
Xdatum key;
X{
X
X	return(dbm_delete(current_db, key));
X}
X
X#endif NEWDBM
@//E*O*F dbmstuff.c//
if test 1595 -ne "`wc -c <'dbmstuff.c'`"; then
    echo shar: error transmitting "'dbmstuff.c'" '(should have been 1595 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'misc.c'" '(3643 characters)'
if test -f 'misc.c' ; then 
  echo shar: will not over-write existing file "'misc.c'"
else
sed 's/^X//' >misc.c <<'@//E*O*F misc.c//'
X/* misc.c */
X
X/* vi: set tabstop=4 : */
X
X#include <ctype.h>
X#include <stdio.h>
X
X#include "sp.h"
X
X/*
X * Special character map that determines what the second character of a word
X * can be; see sp.h
X * May be expanded to contain up to 12 entries plus the terminating entry
X * Must end with an entry of two null bytes
X */
Xstruct spchar_map spchar_map[] = {
X	'\'',	QUOTE_CHAR,
X	'&',	AMPER_CHAR,
X	'.',	PERIOD_CHAR,
X	' ',	SPACE_CHAR,
X	'\0',	'\0'
X};
X
Xmk_key(key, soundex, count)
Xkey_t *key;
Xint soundex;
Xint count;
X{
X
X	key[0] = soundex & 0377;
X	key[1] = ((soundex & 037400) >> 8) | ((count & 03) << 6);
X	key[2] = (count & 01774) >> 2;
X#ifdef DEBUG
X	if (ex_soundex(key) != soundex)
X		fprintf(stderr, "mk_key: soundex failed\n");
X	if (ex_count(key) != count)
X		fprintf(stderr, "mk_key: count failed\n");
X#endif DEBUG
X}
X
Xex_soundex(key)
Xkey_t *key;
X{
X	register int soundex;
X
X	soundex = key[0] & 0377;
X	soundex |= (key[1] & 077) << 8;
X	return(soundex);
X}
X
Xex_count(key)
Xkey_t *key;
X{
X	register int count;
X
X	count = (key[1] & 0300) >> 6;
X	count |= ((key[2] & 0377) << 2);
X	return(count);
X}
X
X/*
Xex_char(key)
Xkey_t *key;
X{
X	int ch;
X
X	ch = (key[1] & 076) >> 1;
X	return(ch + 'a');
X}
X*/
X
X/*
X * Unpack a word given the retrieved word of length len and its soundex
X * Extract the first letter from the soundex code
X * If the length is 1 and if it is marked as a single character word
X * then the marked character will be overlaid with a null
X * otherwise a null will be appended to the string
X * Adjust for upper case leading character if necessary
X * Return address of the copy
X */
Xchar *
Xmk_word(p, len, s)
Xchar *p;
Xint len, s;
X{
X	register char *q, ch;
X	static char word[MAXWORDLEN + 2];
X
X	q = word;
X	if (len == 1 && (*p & SINGLE_CHAR)) {
X		*(q + 1) = '\0';
X		len = 0;
X	}
X	else
X		*(q + len + 1) = '\0';
X
X	/*
X	 * Extract the first character from the soundex and
X	 * adjust case
X	 */
X	if (*p & UPPER_CHAR)
X		ch = (s & 037) + 'A';
X	else
X		ch = (s & 037) + 'a';
X	*q++ = ch;
X
X	if (len != 0) {				/* if more than one char adjust second char */
X		ch = *p & MASK_CHAR;
X		if (ch < 26)
X			ch += 'a';
X		else if (ch < 52)
X			ch = ch - 26 + 'A';
X		else if ((ch = fromspchar(ch)) == '\0') {
X			fprintf(stderr, "Bogus second char in mk_word\n");
X			exit(1);
X		}
X		*q++ = ch;
X		p++;
X		len--;
X	}
X
X	while (len-- > 0)
X		*q++ = *p++;
X	return(word);
X}
X
X/*
X * Convert the second character of a word to a special character code
X * Return null if there is no mapping
X */
Xtospchar(ch)
Xchar ch;
X{
X	register struct spchar_map *m;
X
X	for (m = spchar_map; m->spchar != '\0'; m++)
X		if (ch == m->spchar)
X			break;
X	return(m->code);
X}
X
X/*
X * Convert from the special character code to the ASCII code
X * Return null if there is no mapping
X */
Xfromspchar(ch)
Xchar ch;
X{
X	register struct spchar_map *m;
X
X	for (m = spchar_map; m->spchar != '\0'; m++)
X		if (ch == m->code)
X			break;
X	return(m->spchar);
X}
X
X/*
X * Compare two strings, independent of case, given their lengths
X */
X/*
Xstrnmatch(str1, len1, str2, len2)
Xchar *str1, *str2;
Xint len1, len2;
X{
X	register char ch1, ch2;
X
X	if (len1 != len2)
X		return(0);
X	while (len1-- > 0) {
X		ch1 = *str1++;
X		ch2 = *str2++;
X		if (ch1 != ch2) {
X			if (isupper(ch1))
X				ch1 = tolower(ch1);
X			if (isupper(ch2))
X				ch2 = tolower(ch2);
X			if (ch1 != ch2)
X				return(0);
X		}
X	}
X	return(1);
X}
X*/
X
X/*
X * Compare two strings, independent of case
X */
Xstrmatch(p, q)
Xchar *p, *q;
X{
X	register char ch1, ch2;
X
X	while (1) {
X		ch1 = *p++;
X		ch2 = *q++;
X		if (ch1 == '\0' || ch2 == '\0')
X			break;
X		if (ch1 != ch2) {
X			if (isupper(ch1))
X				ch1 = tolower(ch1);
X			if (isupper(ch2))
X				ch2 = tolower(ch2);
X			if (ch1 != ch2)
X				break;
X		}
X	}
X	return(ch1 - ch2);
X}
X
@//E*O*F misc.c//
if test 3643 -ne "`wc -c <'misc.c'`"; then
    echo shar: error transmitting "'misc.c'" '(should have been 3643 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'mksp.c'" '(12797 characters)'
if test -f 'mksp.c' ; then 
  echo shar: will not over-write existing file "'mksp.c'"
else
sed 's/^X//' >mksp.c <<'@//E*O*F mksp.c//'
X/* vi: set tabstop=4 : */
X
X/*
X * mksp - make soundex dictionary
X * Version 1.3,  December 1986
X *
X * If <soundexfile.{dir, pag}> do not exist, try to create them
X * If they do exist and are not empty, then words will be added
X * from the standard input
X * Only when words are being added to an existing database are duplicate words
X * ignored
X * Valid words (words beginning with an alphabetic) are stored as given but
X * comparisons for duplicates is case independent.
X * Non-alphabetic characters are ignored in computing the soundex
X *
X * Permission is given to copy or distribute this program provided you
X * do not remove this header or make money off of the program.
X *
X * Please send comments and suggestions to:
X * Barry Brachman
X * Dept. of Computer Science
X * Univ. of British Columbia
X * Vancouver, B.C. V6T 1W5
X *
X * .. {ihnp4!alberta, uw-beaver}!ubc-vision!ubc-cs!brachman
X * brachman at cs.ubc.cdn
X * brachman%ubc.csnet at csnet-relay.arpa
X * brachman at ubc.csnet
X */
X
X#include <ctype.h>
X#include <errno.h>
X#include <sys/types.h>
X#include <sys/file.h>
X#include <sys/stat.h>
X#include <stdio.h>
X
X#ifdef NEWDBM
X#include <ndbm.h>
X#include <sys/file.h>
X#else !NEWDBM
X#include <dbm.h>
X#endif NEWDBM
X
X#include "sp.h"
X
X#define NEW_DICT			0
X#define OLD_DICT			1
X
X#define VFREQ				1000	/* frequency for verbose messages */
X
X#define streq(X, Y)			(!strcmp(X, Y))
X#define strneq(X, Y, N)		(!strncmp(X, Y, N))
X
X#define USAGE				"Usage: mksp -tad [-v[#]] <soundexfile>"
X
Xint map[26][667];
X
Xdatum FETCH(), FIRSTKEY(), NEXTKEY();
X
Xkey_t keyvec[KEYSIZE];
Xkey_t *key = keyvec;
X
X/*
X * Soundex codes
X * The program depends upon the numbers zero through six being used
X * but this can easily be changed
X */
Xchar soundex_code_map[26] = {
X/***	 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P	***/ 
X		 0, 1, 2, 3, 0, 1, 2, 0, 0, 2, 2, 4, 5, 5, 0, 1,
X
X/***	 Q  R  S  T  U  V  W  X  Y  Z			***/
X		 2, 6, 2, 3, 0, 1, 0, 2, 0, 2
X};
X
Xint digit_part;
X
Xint aflag, dflag, tflag, vflag;
X
Xchar *mk_word();
X
Xmain(argc, argv)
Xint argc;
Xchar **argv;
X{
X	register int i;
X	char *file;
X
X	if (argc != 3 && argc != 4) {
X		fprintf(stderr, "%s\n", USAGE);
X		exit(1);
X	}
X	aflag = dflag = tflag = vflag = 0;
X	file = (char *) NULL;
X
X	for (i = 1; i < argc; i++) {
X		if (streq(argv[i], "-a"))
X			aflag = 1;
X		else if (streq(argv[i], "-d"))
X			dflag = 1;
X		else if (streq(argv[i], "-t"))
X			tflag = 1;
X		else if (strneq(argv[i], "-v", 2)) {
X			if (isdigit(argv[i][2]))
X				vflag = atoi(&argv[i][2]);
X			else
X				vflag = 1;
X		}
X		else if (file == (char *) NULL)
X			file = argv[i];
X		else {
X			fprintf(stderr, "%s\n", USAGE);
X			exit(1);
X		}
X	}
X
X	if (file == (char *) NULL || (tflag + aflag + dflag) != 1) {
X		fprintf(stderr, "%s\n", USAGE);
X		exit(1);
X	}
X
X	if (aflag) {
X		addwords(file);
X		if (vflag > 1) {
X			register int j, total;
X			int m, max;
X
X			fprintf(stderr, "Counters:\n");
X			for (i = 0; i < 26; i++) {
X				total = max = map[i][0];
X				for (j = 1; j < 667; j++) {
X					total += (m = map[i][j]);
X					if (m > max)
X						max = m;
X				}
X				if (max > 0)
X					fprintf(stderr, "%c: max %d total %d\n", 'a'+i, max, total);
X			}
X		}
X	}
X	else if (dflag)
X		deletewords(file);
X	else if (tflag)
X		prcontents(file);
X	exit(0);
X}
X
X/*
X * Add words read from stdin to the database
X * The key is the 3 digit soundex code for the word plus a disambiguating
X * counter.  Different counter values are used for words with the same soundex
X * code.  The maximum counter value is MAXCOUNT.  If the counter overflows then
X * we lose, but given at least 8 bits this seems unlikely.
X */
Xaddwords(name)
Xchar *name;
X{
X	register int c, count, delete, duplicate, i, len;
X	register int *p;
X	int ch, s, status;
X	char wcopy[MAXWORDLEN + 2], word[MAXWORDLEN + 2];
X	datum dbm_key, dbm_content;
X
X	status = setup(name);
X	if (DBMINIT(name, O_RDWR) == -1) {
X		fprintf(stderr, "mksp: Can't initialize\n");
X		exit(1);
X	}
X
X	for (i = 0; i < 26; i++)
X		for (c = 0; c < 667; c++)
X			map[i][c] = 0;
X
X	dbm_key.dptr = (char *) key;
X	dbm_key.dsize = KEYSIZE;
X
X	count = 0;
X
X	while (fgets(word, sizeof(word), stdin) != (char *) NULL) {
X		len = strlen(word);
X		if (word[len - 1] != '\n') {
X			fprintf(stderr, "mksp: Word too long: %s", word);
X			while ((ch = getchar()) != '\n')	/* flush rest of line */
X				putc(ch, stderr);
X			putc('\n', stderr);
X			continue;
X		}
X		word[--len] = '\0';
X		if (len > MAXWORDLEN) {
X			fprintf(stderr, "mksp: Word too long: %s\n", word);
X			continue;
X		}
X
X		if ((s = soundex(word, 3)) == BAD_WORD) {
X			if (vflag)
X				fprintf(stderr, "Ignoring bad word: %s\n", word);
X			continue;
X		}
X		ch = (isupper(word[0]) ? tolower(word[0]) : word[0]) - 'a';
X		p = &(map[ch][digit_part]);
X
X		/*
X		 * If words are being added to an old dictionary,
X		 * check for duplication and watch for a deleted entry
X		 * The reason for only checking for duplicates in old dictionaries is
X		 * that usually when you're creating a new dictionary the words are
X		 * already sorted and unique and the creation of a large dictionary is
X		 * slow enough already.
X		 */
X		duplicate = 0;
X		delete = -1;			/* an 'impossible' counter */
X		if (status == OLD_DICT) {
X			c = 0;
X			while (1) {
X				mk_key(key, s, c);
X				dbm_content = FETCH(dbm_key);
X				if (dbm_content.dptr == 0)
X					break;
X
X				if (!IS_DELETED(dbm_content)) {
X					char *str;
X
X					str = mk_word(dbm_content.dptr, dbm_content.dsize, s);
X					if (strmatch(word, str) == 0) {
X						duplicate = 1;
X						if (vflag)
X							fprintf(stderr, "duplicate: %s\n", word);
X						break;
X					}
X				}
X				else if (delete < 0)	/* choose delete nearest front */
X					delete = c;
X
X				if (++c > MAXCOUNT) {
X					fprintf(stderr, "mksp: Counter overflow\n");
X					fprintf(stderr, "soundex: %c%d\n", ch+'a', b10(digit_part));
X					exit(1);
X				}
X			}
X			if (duplicate)
X				continue;
X			*p = c;
X		}
X		if (*p > MAXCOUNT) {
X			fprintf(stderr, "mksp: Counter overflow\n");
X			fprintf(stderr, "soundex: %c%d\n", ch+'a', b10(digit_part));
X			exit(1);
X		}
X		mk_key(key, s, *p);
X		*p = *p + 1;
X		strcpy(wcopy, word);
X		if (len == 1) {
X			if (isupper(wcopy[0]))
X				wcopy[0] |= UPPER_CHAR;
X			wcopy[0] |= SINGLE_CHAR;
X			dbm_content.dptr = wcopy;
X		}
X		else {
X			if (isupper(wcopy[1]))
X				wcopy[1] = wcopy[1] - 'A' + 26;
X			else if (islower(wcopy[1]))
X				wcopy[1] = wcopy[1] - 'a';
X			else if ((wcopy[1] = tospchar(wcopy[1])) == '\0') {
X				fprintf(stderr, "Bogus second char: can't happen!\n");
X				exit(1);
X			}
X			if (isupper(wcopy[0]))
X				wcopy[1] = wcopy[1] | UPPER_CHAR;
X			dbm_content.dptr = wcopy + 1;
X			len--;
X		}
X		dbm_content.dsize = len;					/* null not stored */
X		if (delete < 0) {
X			if (STORE(dbm_key, dbm_content) == -1) {
X				fprintf(stderr, "mksp: Can't store\n");
X				exit(1);
X			}
X		}
X		else {
X			if (vflag)
X				fprintf(stderr, "reusing: %s\n", word);
X			mk_key(key, s, delete);
X			if (REPLACE(dbm_key, dbm_content) == -1) {
X				fprintf(stderr, "mksp: Can't replace\n");
X				exit(1);
X			}
X		}
X		count++;
X		if (vflag > 1)
X			fprintf(stderr, "%5d: %s(%d)\n", count, word, ex_count(key));
X		if (vflag && (count % VFREQ) == 0)
X			fprintf(stderr, "%5d: %s\n", count, word);
X	}
X	if (vflag)
X		fprintf(stderr, "%d words\n", count);
X	DBMCLOSE();
X}
X
X/*
X * Print out everything
X */
Xprcontents(name)
Xchar *name;
X{
X	register int s;
X	datum dbm_key, dbm_content;
X
X	if (DBMINIT(name, O_RDONLY) == -1)
X		exit(1);
X
X	dbm_key = FIRSTKEY();
X	while (dbm_key.dptr != NULL) {
X		dbm_content = FETCH(dbm_key);
X		if (dbm_content.dptr == 0)
X			break;						/* ??? */
X
X		if (vflag)
X			printf("%3d. ", ex_count((key_t *) dbm_key.dptr));
X		if (IS_DELETED(dbm_content)) {
X			if (vflag)
X				printf("(deleted)\n");
X		}
X		else {
X			s = ex_soundex((key_t *) dbm_key.dptr);
X			printf("%s\n", mk_word(dbm_content.dptr, dbm_content.dsize, s));
X		}
X		dbm_key = NEXTKEY(dbm_key);
X	}
X	DBMCLOSE();
X}
X
X/*
X * When words are deleted they must be marked as such rather than deleted
X * using DELETE().  This is because the sequence of counters must remain
X * continuous.  If DELETE() is used then any entries with the same soundex
X * but with a larger counter value would not be accessible.  This approach
X * does cost some extra space but if an addition is made to the chain then
X * a deleted counter slot will be reused.  Also, the storage used by the word
X * should be made available to dbm.  This could be improved somewhat
X * by actually using DELETE() on the last entry of the chain.
X */
Xdeletewords(name)
Xchar *name;
X{
X	register int c, ch, len, s;
X	register char *p;
X	char word[MAXWORDLEN + 2];
X	datum dbm_key, dbm_content;
X
X	if (DBMINIT(name, O_RDWR) == -1)
X		exit(1);
X
X	while (fgets(word, sizeof(word), stdin) != (char *) NULL) {
X		len = strlen(word);
X		if (word[len - 1] != '\n') {
X			fprintf(stderr, "mksp: Word too long: %s", word);
X			while ((ch = getchar()) != '\n')	/* flush rest of line */
X				putc(ch, stderr);
X			putc('\n', stderr);
X			continue;
X		}
X		word[--len] = '\0';
X		if (len > MAXWORDLEN) {
X			fprintf(stderr, "mksp: Word too long: %s\n", word);
X			continue;
X		}
X
X		if ((s = soundex(word, 3)) == BAD_WORD) {
X			if (vflag)
X				fprintf(stderr, "Bad word: %s\n", word);
X			continue;
X		}
X
X		c = 0;
X		while (1) {
X			dbm_key.dptr = (char *) key;
X			dbm_key.dsize = KEYSIZE;
X			mk_key(key, s, c);
X			dbm_content = FETCH(dbm_key);
X			if (dbm_content.dptr == NULL) {
X				if (vflag)
X					fprintf(stderr, "Not found: %s\n", word);
X				break;
X			}
X
X			if (!IS_DELETED(dbm_content)) {
X				p = mk_word(dbm_content.dptr, dbm_content.dsize, s);
X				if (strmatch(word, p) == 0) {
X					/*
X					 * Aside:
X					 * Since dptr points to static storage it must be reset
X					 * if we want to retain the old content (content.dptr=word)
X					 * This took a while to determine...
X					 * Anyhow, since there is no need to store the old word
X					 * we free up the space
X					 */
X					dbm_content.dptr = "";
X					dbm_content.dsize = 0;
X					if (REPLACE(dbm_key, dbm_content) == -1)
X						fprintf(stderr, "mksp: delete of '%s' failed\n", word);
X					else if (vflag) {
X						if (vflag > 1)
X							fprintf(stderr, "%d. %s ", c, p);
X						fprintf(stderr, "deleted\n");
X					}
X					break;
X				}
X				else if (vflag > 1)
X					fprintf(stderr, "%d. %s\n", c, p);
X			}
X			else if (vflag > 1)
X				fprintf(stderr, "%d. (deleted)\n", c);
X
X			if (++c > MAXCOUNT) {
X				ch = isupper(word[0]) ? tolower(word[0]) : word[0];
X				fprintf(stderr, "mksp: Counter overflow\n");
X				fprintf(stderr, "soundex: %c%d\n", ch, b10(digit_part));
X				exit(1);
X			}
X		}
X	}
X	DBMCLOSE();
X}
X
X/*
X * Setup the dictionary files if necessary
X */
Xsetup(name)
Xchar *name;
X{
X	register int s1, s2;
X
X	s1 = check_dict(name, ".dir");
X	s2 = check_dict(name, ".pag");
X	if (s1 == NEW_DICT && s2 == NEW_DICT)
X		return(NEW_DICT);
X	return(OLD_DICT);
X}
X
X/*
X * Check if a dictionary file exists:
X *	- if not, try to create it
X *	- if so, see if it is empty
X * Return NEW_DICT if an empty file exists,
X * OLD_DICT if a non-empty file exists
X * Default mode for new files is 0666
X */
Xcheck_dict(name, ext)
Xchar *name, *ext;
X{
X	register int len, s;
X	char *filename;
X	struct stat statbuf;
X	char *malloc();
X	extern int errno;
X
X	len = strlen(name) + strlen(ext) + 1;
X	filename = (char *) malloc((unsigned) len);
X	if (filename == (char *) NULL) {
X		fprintf(stderr, "mksp: Can't malloc '%s.%s'\n", name, ext);
X		exit(1);
X	}
X	sprintf(filename, "%s%s", name, ext);
X	if (stat(filename, &statbuf) == -1) {
X		if (errno != ENOENT) {
X			perror("mksp");
X			exit(1);
X		}
X		if (creat(filename, 0666) == -1) {
X			perror("mksp");
X			exit(1);
X		}
X		s = NEW_DICT;
X	}
X	else {
X		if (statbuf.st_size == 0)
X			s = NEW_DICT;
X		else
X			s = OLD_DICT;
X	}
X	return(s);
X}
X
X/*
X * Compute an 'n' digit Soundex code for 'word' 
X * As a side effect, leave the digit part of the soundex in digit_part
X *
X * Since the soundex can be considered a base 7 number, if 'n' is:
X *	3	require  9 (10 if base 10) bits for digits
X *	4	require 12 (13) bits
X *	5	require 15 (17) bits
X *	6	require 17 (20) bits
X *
X * The three slightly different versions of this routine should be coalesced.
X */
Xsoundex(word, n)
Xregister char *word;
Xint n;
X{
X	register int c, soundex_length, previous_code;
X	register char *p, *w;
X	char wcopy[MAXWORDLEN + 2];
X
X	if (!IS_VALID(word))
X		return(-1);
X
X	strcpy(wcopy, word);
X	p = w = wcopy;
X
X	while (*p != '\0') {
X		if (isupper(*p))
X			*p = tolower(*p);
X		p++;
X	}
X
X	digit_part = 0;
X	soundex_length = 0;
X	previous_code = soundex_code_map[*w - 'a'];
X	for (p = w + 1; *p != '\0' && soundex_length < n; p++) {
X		if (!isalpha(*p))
X			continue;
X		c = soundex_code_map[*p - 'a'];
X		if (c == 0 || previous_code == c) {
X			previous_code = c;
X			continue;
X		}
X		digit_part = digit_part * 7 + c;
X		previous_code = c;
X		soundex_length++;
X	}
X	while (soundex_length++ < n)
X		digit_part *= 7;
X	return((digit_part << 5) + *w - 'a');
X}
X
Xb10(n)
Xint n;
X{
X	register int b10, s;
X
X	for (b10 = 0, s = 1; n != 0; n /= 7) {
X		b10 += (n % 7) * s;
X		s *= 10;
X	}
X	return(b10);
X}
X
@//E*O*F mksp.c//
if test 12797 -ne "`wc -c <'mksp.c'`"; then
    echo shar: error transmitting "'mksp.c'" '(should have been 12797 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'sp.1'" '(5932 characters)'
if test -f 'sp.1' ; then 
  echo shar: will not over-write existing file "'sp.1'"
else
sed 's/^X//' >sp.1 <<'@//E*O*F sp.1//'
X.TH SP 1-LOCAL "11 December 1986"
X.UC 4
X.SH NAME
Xsp \- give possible spellings
X.br
Xmksp \- maintain sp dictionaries
X.br
Xcalcsoundex \- calculate soundex values
X.SH SYNOPSIS
X.B sp
X[
X.B -vace
X] [
X.B -f dictionary-list
X] [
X.B word ...
X]
X.br
X.B mksp
X[
X.B -adt
X] [
X.B -v#
X]
X.B dictionary
X.br
X.B calcsoundex
X[
X-v
X]
X.SH DESCRIPTION
X.I Sp
Xtakes one or more words as input and for each word prints
Xa list of possible spellings.
XIf the words are not given on the command line, the program prompts and reads
Xfrom the standard input.
XYou must know the first letter of the word.
XUpper case is mapped to lower case.
XWords must start with an alphabetic character, but any subsequent
Xcharacters need not be alphabetic (but see the limitation below on words that
Xmay appear in the dectionary).
XBlanks are allowed within a word.
X.PP
XUp to ten dictionaries previously created by
X.I mksp
Xmay be specified by a command line argument and an environment variable.
XThe name of a dictionary is specified by a pathname,
Xnot including the suffix (.dir or .pag).
XA list of dictionaries consists of one or more colon separated dictionary
Xnames.
XThe environment variable SPPATH may be set to a dictionary list.
XIf a command line dictionary list is given in addition to the SPPATH variable,
Xall dictionaries are used.
XIf no dictionaries are specified the program looks for default dictionaries.
X.PP
XTo reduce the size of the word list, certain heuristics are used.
XNormally, all words
X.I sp
Xconsiders to be a "satisfactory" match are printed.
XThe \fB-c\fR option causes only close
Xmatches or an exact match to be printed.
XThe \fB-e\fR option only prints exact matches.
XThe \fB-a\fR option causes all words matched to be printed.
XThe output is sorted alphabetically and indicators are printed beside each
Xword:
X.sp 2
X.in +0.5i
X.nf
X.na
X   X == exact match
X.br
X   ! == close match
X.br
X   * == good match
X.br
X ' ' == matched
X.in -0.5i
X.fi
X.ad
X.sp 2
XDuplicated words are not removed from the listing.
X.PP
XIf the \fB-v\fR flag is given,
X.I sp
Xbecomes verbose.
X.PP
X.I Mksp
Xis a program to maintain dictionaries for use with
X.I sp.
XThe \fB-a\fR option is used to create a new dictionary or to add
Xwords to an existing dictionary.
XThe words to be put in the dictionary are read from the standard input,
Xone per line.
XThe \fB-v\fR flag (which may immediately be followed by an optional number)
Xcauses some information to be printed as words are processed.
XA non-flag argument to
X.I mksp
Xis assumed to be the prefix of the name of the dictionary files.
XThe dictionary consists of two files, one with a ".dir" suffix and one with
Xa ".pag" suffix (see \fBdbm\fR(3X)).
XIf these files do not exist,
X.I mksp
Xwill create both.
XThe words need not be sorted.
XThere should not be duplicates in the word list when creating a new dictionary
Xbut when words are added to an existing dictionary, duplicates are ignored.
XUpper case letters are stored in the dictionary but
X.I mksp
Xmaps upper case to lower case when checking for duplicate words.
X.I Sp
Xis case insensitive when searching the database.
XThe first character of a word must be alphabetic and
Xthe second character (if present) must be either alphabetic, a single quote,
Xan ampersand, a period, or a blank.
XThere is no restriction on the third and subsequent characters.
X.PP
XThe \fB-d\fR option is used to delete words from the specified dictionary.
XThe words are read from the standard input.
XIf a word is not found in the dictionary, no message is printed.
XThe comparison is case insensitive.
X.PP
XThe \fB-t\fR option prints the contents of the specified dictionary.
XThe words are not sorted.
X.PP
X.I Calcsoundex
Xreads words from the standard input (one per line) and prints the soundex
Xcode corresponding to each word on stdout.
XWith the \fB-v\fR flag,
X.I calcsoundex
Xalso echoes each word.
X.SH EXAMPLE
X.in +0.5i
X.na
X.nf
X% mksp -a mydictionary
Xaardvark
Xprecipitation
X<more words>
Xzyzz
X<^D>
X% sp -c -f mydictionary:herdictionary
XWord? propogate
X!  1. perfect
X!  2. perfectible
X<more words>
X!  7. propagate
X!  8. proposition
X<more words>
XWord? <^D>
X.fi
X.ad
X.in -0.5i
X.SH FILES
X.nf
X.na
X<dictionary.pag>, <dictionary.dir>      dictionary data base
X/usr/local/lib/sp.dict.[12]             default dictionaries
X.fi
X.ad
X.SH LIMITATIONS
XNo more than 10 dictionaries may be specified.
XThe maximum length of a word is 50 characters.
XThe program can return up to 400 matches taking up a maximum of 20480 bytes.
XThe limitation on what the second character of a word can be is due to
Xthe algorithm used to compress the dictionaries; there is some room in the
Xdata structure for a few other characters to become valid.
XDbm doesn't work between a Sun and VAX across NFS so you can't share a
Xdictionary between a VAX and a Sun.
X.PP
XThere is a limit on the number of words having the same soundex code that can
Xappear in a single dictionary.
XThis value should be at least 256 (on a VAX/Sun it is 1024), but it depends
Xon how the program has been configured locally.
XIf you come up against this limitation you can split your dictionary; e.g.,
Xextract every second word from the big dictionary to make a new dictionary,
Xthen delete the words from the big dictionary.
XThe following pipeline can be used to determine the number of times each
Xsoundex code appears in a list of words (one per line):
X.sp 2
X.ti +0.5i
Xcalcsoundex | sort | uniq -c | sort -r -0n
X.SH "SEE ALSO"
Xspell(1), uniq(1), dbm(3X), ndbm(3)
X.br
XDonald E. Knuth, The Art of Computer Programming, Vol. 3,
XSorting and Searching, Addison-Wesley, pp. 391-392, 1973.
X.SH AUTHOR
XBarry Brachman
X.br
XDept. of Computer Science
X.br
XUniversity of British Columbia
X.SH BUGS
XYou may not agree on what constitutes a match.
XYou are likely to have to create your own dictionary as the UNIX
Xdictionary is far from complete.  In particular, the suffixes
Xhave been removed from most words.
XThe limitations mentioned above are arbitrary.
XThe limitation on the second character of a word is disgusting.
X
@//E*O*F sp.1//
if test 5932 -ne "`wc -c <'sp.1'`"; then
    echo shar: error transmitting "'sp.1'" '(should have been 5932 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'MANIFEST'" '(1093 characters)'
if test -f 'MANIFEST' ; then 
  echo shar: will not over-write existing file "'MANIFEST'"
else
sed 's/^X//' >MANIFEST <<'@//E*O*F MANIFEST//'
X  File Name             Kit #   Description
X-----------------------------------------------------------
X MANIFEST                  1   This shipping list
X Makefile                  1   Makefile for use with old dbm routines and source
X Makefile.newdbm           1   Makefile for 4.3BSD dbm, dbmclose(), or old dbm w/o source
X README                    1   Configuration instructions, etc.
X calcsoundex.c             1   Program to calculate soundex codes
X dbm.bug                   1   A bug report for the old dbm routines
X dbm.diffs                 1   Diffs to be applied to old dbm routines
X dbmstuff.c                1   Interface to old and new dbm routines
X misc.c                    1   Miscellaneous support routines
X mksp.c                    1   Program to maintain dictionaries
X sp.1                      1   Man page for sp/mksp/calcsoundex
X sp.9                      2   Man page for EMACS interface to sp
X sp.c                      2   Program to search dictionaries
X sp.h                      2   Header file
X sp.ml                     2   Mlisp code for EMACS interface to sp
@//E*O*F MANIFEST//
if test 1093 -ne "`wc -c <'MANIFEST'`"; then
    echo shar: error transmitting "'MANIFEST'" '(should have been 1093 characters)'
fi
fi # end of overwriting check
echo shar: "End of archive 1 (of 2)."
cp /dev/null ark1isdone
DONE=true
for I in 1 2; do
    if test ! -f ark${I}isdone; then
        echo "You still need to run archive ${I}."
        DONE=false
    fi
done
case $DONE in
    true)
        echo "You have run both archives."
        echo 'See the README'
        ;;
esac
##  End of shell archive.
exit 0



More information about the Mod.sources mailing list