UNIX PC Voice Power: unlocking the untapped capabilities? (*LONG*)

Charles Brunow clb at loci.UUCP
Wed Nov 9 07:35:19 AEST 1988


In article <540 at icus.islp.ny.us>, lenny at icus.islp.ny.us (Lenny Tropiano) writes:
> I've posed this before, but now I have proof that it's possible.  I've
> spoken with various people (some who were on the original Voice Power
> development team) who couldn't give me "specifics" but said it was
> possible.   Voice Recognition, how?  That's the question... Since my
> involvement with the Voice Power product on the UNIX pc, I've learned
> a lot.  Learning bits and pieces about CODEC's, PCM (pulse code modulation),
> DSP's (digital signal processors), sub bands, mu-law, a-law, etc...  It's
> still very technical, and way over my head, but I'm learning...  [side note:
> if there is anyone out there who can give me help in the above topics
> please feel free to contact me].
> 
	If you don't already know this stuff pat then you're years away
	from speech recognition (SR).  The coding method and companding
	are basic stuff which you can find in telco references.  There's
	a bit in "Transmission Systems for Communications", by "Members
	of the Technical Staff - Bell Telephone Laboratories",  and you
	could profit from "Digital Signal Processing" by Alan V. Oppenheim
	and Ronald W. Schafer (Prentice-Hall, 1975).  There are bound
	to be other references which are basically equivalent.

	Another sources might be the app notes put out by TI a few years
	back when they were trying to convince the world that they had
	the best speech stuff.  Some of it is very specific, like how
	the vocal tract simulations work (schematics).  My archives are
	too confused to find copies so maybe someone else can lay their
	hands on a copy for you.

	Ultimately the process probably consists of determining the
	coefficients for the filter nodes and looking for the best
	match with the set of known words and updating the coefficients
	either completely or with a damping factor for learning.  The
	problem is that knowing that doesn't get you much closer to
	actually doing it.  There is loads of raw data (assume a 8KHz
	sample rate) which has to be reduced to a form which can be
	efficiently processed while keeping enough data to distinguish
	similar words from different people.  Many people have spent
	lots of time on it without significant break-thoughs.

-- 
			CLBrunow - KA5SOF
	clb at loci.uucp, loci at csccat.uucp, loci at killer.dallas.tx.us
	  Loci Products, POB 833846-131, Richardson, Texas 75083



More information about the Comp.sys.att mailing list