Floating-point non-exactness

Fri Aug 3 17:19:13 AEST 1990

Rick Jones asks

> Is there a general solution to this problem without continuously rounding
> everything?  I believe it's possible by applying a "maximum reliable 
> precision" concept to FP numbers.

I don't see how to do this sensibly.  Presumably, you'd want to say
that fpcmp(x,y)==0 if x and y are within a few units in the last
place.  How few?  Consider first the case where x and y are the
representations of literals xl and yl.  Assume the machine rounds
literals deterministically to the nearest representable FP number.
Then it would be reasonable to define

	fpcmp(x,y) == 0 iff x == y

(Here, "x" means "the FP number representing xl", etc.)  This is
sensible because, indeed, x != y ==>  xl != yl.  

Suppose now that x is the representation of the literal xl and y is
the representation of the result of yl1 * yl2. On VAXes and IEEE
machines, it would then be sensible to say that fpcmp(x,y) == 0 iff x
and y differed by no more than (I think) 1 ulp (maybe 2; it's late).
This continues; every new expression supplying x and y determines a
new "sensible".

> Can you ensure that "fpcmp (a, b)" and "fpcmp (a-b, 0.0)" yield the same 
> result when a and b are very close?

The subtraction loses information, so "no".  Let's suppose that we
carry 4 binary digits of significand.  You certainly want to say that
fpcmp(0.125, 0.25) == -1.  But to satisfy your condition, this
requires that fpcmp(-0.125, 0.0) == -1.  That in turn requires that
fpcmp(1.125,1.25) == -1 (since 1.125-1.25 is also -0.125) [these are
all multiples of powers of 2, so the arithmetic is generally exact].
However, 1.125 and 1.25 differ by only one in the last place; thus, you are
forced to consider two FP numbers different if they differ in the last
place---in other words if they aren't == according to built-in FP
equality.

I assume in this response that you want an answer suitable for floating point, 
and that therefore it is unacceptable to say that fpcmp(x,y)==0 if the
*absolute* 
error |x-y| is less than some threshold.  If absolute error were acceptable for
some reason (it does not seem to be for your intended application), then we
could make fpcmp(x,y) and fpcmp(x-y,0.0) compatible, as other responders
have indicated. 

Paul N. Hilfinger, Associate Professor			(415) 642-8401        
Computer Science Division				Hilfinger at Berkeley.EDU
Dept. of Electrical Engineering and Computer Sciences
University of California
Berkeley, CA  94720