Floating Point Expectations

Fri May 25 16:11:36 AEST 1990

In article <1990May24.132423.3080 at eddie.mit.edu>, aryeh at eddie.mit.edu (Aryeh M. Weiss) writes:
> In article <411 at yonder.UUCP> michael at yonder.UUCP (Michael E. Haws) writes:
> >Is it reasonable to expect that the following code should
> >work the same on all platforms, and produce "good" results?
> >        if ((45.0 / 100.0) == .45)
> >		printf("good\n");
> >
> 
> Only numbers that can be expressed exactly in base 2 will produce
> "good" results.  0.1 and 0.01 are prime examples of numbers that
> cannot.  (A common textbook example is accumulating 0.01 100 times and
> not getting 1.0.) 

Not so fast there! There are still computers around with BCD (binary
coded decimal) floating point, both in hardware and software. There
are even machines which do not have an ordinary radix, such as the
Wang 'logarithmic' floating point representation. What you really
intend to say here is that that floating point numbers which are
rational fractions with denominators given by a power of the radix
may escape rounding. 

Portable and careful use of floating point is not based on assumptions
about when rounding may occur or in what way numbers will be rounded.
A good example of this is "Moler's expression", found in various
places, (e.g. LINPACK). This expression attempts to determine the
machine epsilon on the assumption that the radix of the representation
is not a power of three, and even though there are very few (if any)
machines for which this assumption is not valid, there is explicit
comment indicating this in the sources. 

(a FORTRAN version of Moler's expression might be:

	((4.0/3.0) - 1.0) * 3.0 - 1.0

in single precision.)

Now it is sometimes quite useful to assume that the radix of the
floating representation is known, but this assumption cannot be
regarded as portable. Two such cases are balancing a matrix before
applying the QR algorithm to compute eigenvalues, where the diagonal
is scaled by a vector of powers of the base of the floating point
representation:

	"The diagonal matrix D is chosen to have the form
	 diag(beta^i1, beta^i2, ..., beta^in) where beta is the
	 floating point base. Note that D^-1 A D can be computed
	 without roundoff."

From: Golub and van Loan, _Matrix Computations_, 2d edition, p. 381
Johns Hopkins, Baltimore, (1989).

Then with respect to computing the matrix exponential by scaling
and squaring, the same authors (in the same book on p. 557) point
out that division by a power of two is particularly convenient
because the resultant power amounts to repeated scaling. They _do
not_ make any remark about the absence of roundoff. This is because
the algorithm is particularly suited to squaring, and the division
is by a power of two for this reason, not because it may be the 
floating radix. If they had in mind only machines with radix 2, they
would likely have remarked on the roundoff advantage as well.

It becomes quite clear that this authoritative and very recent
reference is written with the point of view that you should _not_
assume the radix of the floating point representation is 2, even
when it might be mildly beneficial to take advantage of this. In
any case where "all platforms" are under consideration, or even 
"all UNIX platforms", one should not _expect_ that the dyadic
rationals are exempt from rounding. They are in fact mute on the
subject of non-radix floating representations, since on p. 61 they
give a radix model for floating point representations. 

Later,
Andrew Mullhaupt