soundex algorithm wanted

Chris Torek chris at umcp-cs.UUCP
Fri Sep 5 00:06:47 AEST 1986


In article <1239 at whuxl.UUCP> mike at whuxl.UUCP (BALDWIN) writes:
>	register char	c, lc, prev = '0';

`register int' generates better code on my compiler, and still works.

>		if (isalpha(*name)) {

First you should test isascii(*name) (a nit).

>			lc = tolower(*name);

Watch out!  Some tolower()s fail miserably if !isupper(c).

Anyway, assuming that the basic algorithm is ... sound, I would
change the driver routine, so:

#include <ctype.h>

#define	SDXLEN	4

char *
soundex(name)
	register char *name;
{
	static char buf[SDXLEN+1];
	static char codes[] = "01230120022455012623010202";
	register int c, i = 0, prev;
	char *strcpy();

#ifdef lint
	/* lint cannot tell that prev is set before used */
	prev = 0;
#endif
	(void) strcpy(buf, "a000");
	while ((c = *name++) != 0 && i < SDXLEN) {
		/*
		 * Throw out non-alphabetics, and convert upper case
		 * to lower.
		 */
		if (!isascii(c) || !isalpha(c))
			continue;
		if (isupper(c))
			c = tolower(c);
		/*
		 * Non-first characters must translate to non-zero codes
		 * that are different from the previous code; throw out
		 * those that translate to zero or to prev.
		 */
		if (i > 0 && ((c = codes[c - 'a']) == '0' || c == prev))
			continue;
		buf[i++] = prev = c;
	}
	return (buf);
}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris at umcp-cs		ARPA:	chris at mimsy.umd.edu



More information about the Comp.sources.unix mailing list