Soundex algorithm

Chris Torek chris at mimsy.UUCP
Tue Jul 12 07:51:27 AEST 1988


[I have deleted groups comp.theory and comp.ai since Soundex has little
to do with these]

In article <12520 at sunybcs.UUCP> stewart at sunybcs.uucp (Norman R. Stewart)
writes:
>2: Apply the following rules to produce a code of one letter and
>   three numbers.
>   A: The first letter of the word becomes the initial character
>      in the code.
>   B: When two or more letters from the same group occur together
>      only the first is coded.
>   C: If two letters from the same group are seperated by an H or
>      a W, code only the first.
>   D: Group 7 letters are never coded (this does not include the
>      first letter in the word, which is always coded).

[I thought Soundex codes were usually fixed at four symbols.]

What if more than two letters from the same group are separated by H
or W?  For instance: FDHTWTHTWL.  Is this encoded as F334 or as F34?

The table has L=4, R=6; I find this surprising, as both R and L are
semivowels and they are easily confused by those who did not grow up
with the distinction (e.g., some Orientals).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.lang.c mailing list