Soundex (sounds like)

Rick Jones rick_jones at f616.n713.fido.oz
Thu Dec 14 21:35:24 AEST 1989


Original to: ing at hades.oz
G'day.

Rather than give you the code, here's the algorithm (it's a lot simpler
to do than most people think):

A soundex code is a four character representation based on the way a name 
sounds rather than the way it is spelled. Theoretically, using this system, 
you should be able to index a name so that it can be found no matter how it 
was spelled. The system was developed by Margaret K. Odell and Robert C. 
Russell (see U.S. Patents 1261167 [1918] and 1435663 [1922]). 

Every soundex code consists of a letter and three numbers, such as B525. The 
letter is always the first letter of the surname. The numbers are assigned 
this way: 

  1  =  b,p,f,v
  2  =  c,s,k,g,j,q,x,z
  3  =  d,t
  4  =  l
  5  =  m,n
  6  =  r
  disregard  -  a,e,i,o,u,w,y,h

To figure out a surname's code, do this:           JOHNSON
   - Eliminate any a,e,i,o,u,w,y,h                 JNSN
   - Write the first letter, as is, followed
     by the codes found in the table above         JNSN = J525

No matter how long or short the surname is, the soundex code is always the 
first letter of the name followed by three numbers. If you have coded the 
first letter and three numbers but still have more letters in the name, 
ignore them. If you have run out of letters in the name before you have three 
numbers, then add zeroes to the code: 

   WASHINGTON = WSNGTN = W252 (ignore the ending TN)
   KUHNE      = KN     = K500 (add zeroes to the end)
   YE         = Y      = Y000 (add zeroes to the end)

Any double letters side by side should be treated as one letter. For example 
LLOYD is coded as if it were spelled LOYD. GUTIERREZ is coded as if it were 
GUTIEREZ. 

You may have different letters side by side that have the same code value. 
For example PFISTER (P & F are both 1), JACKSON (CKS are all 2). These 
letters should be treated as one letter.  PFISTER is coded as PSTR (P236) and 
JACKSON is coded as JCN (J250). 

Thus, variations in spellings or mispellings should produce the same code 
number.

This material based on "Beginning Your Genealogical Research in the
National Archives," courtesy ROOTS-BBS, CA, Brian Mavrogeorge, sysop.

If you have any trouble coding the above (hardly likely, I'd imagine),
let me know and I'll write you a piece of code compatible with SVID.
Unfortunately, my routine uses lower-level routines proprietary to my
library and it would be useless to give it to you without several other
support routines.

Hope this is of some help.

Rick Jones.



---
 * Origin: /\/\onitor \/\/orld (~~Sydney Australia~~) (Opus 3:713/616)



More information about the Comp.lang.c mailing list