Detecting type of file in a program
David Goodenough
dg at lakart.UUCP
Fri Feb 10 03:37:11 AEST 1989
tale at pawl.rpi.edu (David C Lawrence) sez:
Stuff about file(1) deleted
> (Aside: I am curious how it determines something is English
> text rather than just ascii text.)
I'd hazard a guess that it looks at the letter distributions. English
has well defined (well fairly well defined) ratios of letters. So you
count how many E's, T's etc. etc. occur, see how close you are to the
"standard". If you are close, say it's English, else say it's ascii.
This may be wrong - those in the know are welcome to correct me, but it's
one possibility that could be made to work.
--
dg at lakart.UUCP - David Goodenough +---+
IHS | +-+-+
....... !harvard!xait!lakart!dg +-+-+ |
AKA: dg%lakart.uucp at xait.xerox.com +---+
More information about the Comp.unix.questions
mailing list