fuzzy strcmp

Istvan Mohos istvan at hhb.UUCP
Fri Dec 22 22:38:39 AEST 1989



tchrist at convexe.uucp (Tom Christiansen @ Convex Computer) writes:
>I'm looking for an algorithm that would allow me to determine
>whether two strings were similar.  Thus 
>
>	"abcde" !~ "xyzzy"
>	"this old man can read" =~ "that old man can't read"
>
>... perhaps just
>    float   strfzcmp(string1,string2)

I must confess, my first reaction was: thank God, Tom 's finally found
a problem he can't solve in Perl.  :-)

You may want to try running the *diff* algorithm along the individual
characters of the two strings (rather than applying it to successive
lines of two files); the ratio of the number of failed chars to the
byte count of the two strings is a dandy float in the range 0.---1.
Thus,
    strfzcmp("abcde","xyzzy") --> 1.
    strfzcmp("this old man can read","that old man can't read") --> .136363..

-- 
        Istvan Mohos
        ...uunet!pyrdc!pyrnj!hhb!istvan
        HHB Systems 1000 Wyckoff Ave. Mahwah NJ 07430 201-848-8000
====================================================================



More information about the Comp.unix.wizards mailing list