Detecting type of file in a program

David Goodenough dg at lakart.UUCP
Fri Feb 10 03:37:11 AEST 1989


tale at pawl.rpi.edu (David C Lawrence) sez:
Stuff about file(1) deleted
> (Aside: I am curious how it determines something is English
> text rather than just ascii text.)

I'd hazard a guess that it looks at the letter distributions. English
has well defined (well fairly well defined) ratios of letters. So you
count how many E's, T's etc. etc. occur, see how close you are to the
"standard". If you are close, say it's English, else say it's ascii.

This may be wrong - those in the know are welcome to correct me, but it's
one possibility that could be made to work.
-- 
	dg at lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp at xait.xerox.com		  	  +---+



More information about the Comp.unix.questions mailing list