Floating point exactness & alternatives (summary)

Rick Jones rick at tetrauk.UUCP
Mon Aug 6 21:49:41 AEST 1990


Thanks to all those who responded to my query about exactness (or lack of it)
in FP numbers.  It has confirmed my suspicion that there isn't a clean solution
to this problem.

The responses can be grouped into 3 broad categories:

a.	"The effective precision is a function of the algorithm and initial data,
	not just the representation"

	While this is true, the representation puts a limit on the precision
	even when the algorithm may suggest something higher.  But more
	fundamentally, this is an issue of _precision_ as opposed to
	_exactness_, which are not quite the same.  If the context is dealing
	with real-world values, they can all be assumed to be irrational, and
	known only to within some understood precision.  When dealing with more
	artificial numbers (see below), they are presumed to be exact, and
	unexpected problems arise because there is not a 1-1 correspondence
	between exact decimal numbers and exact binary FP numbers.

	As my example showed, you don't _expect_ to get into precision issues
	when checking if .291/2 = .1455

b.	"Analyse the bit-pattern of the IEEE format"

	This came from Stefan & Carlos at U. of Basel, and although possible,
	it feels a bit non-portable & hardware specific. We write portable
	packages which run on all sorts of hardware - just how universal IS the
	IEEE format?  (that's not an entirely rhetorical question)

c.	The "constructive reals" solution

	This is interesting, and I will follow up the various references.  It
	seems quite well-known, but sounds a bit computationally expensive,
	though I don't know enough yet to make a qualified comment.

d.	The "continued fractions" solution

	The reference to this came from Colin Plumb, and is also something I
	shall follow up.  It appears to have an advantage that all rational
	numbers have a finite representation.


For interest (if anyone's still with me :-), I'll explain the problem domain.
I deliberately omitted this from my initial query to get as wide a response as
possible.  This is nothing more taxing (!) than business systems, which are not
known for requiring complex mathematics.  They don't, but they do require
exactness.  Auditors have this annoying view that accounts must balance to the
penny, not 1 part in 10 to-the-something.  There is a good case for using some
form of BCD representation, but there are many programming advantages in using
the embedded numerical types of the language (yes, I do know about OOP and
building a BCD class with overloaded operators, that's one of my options).

In fact, the only disadvantage of FP is the exactness issue.  15 significant
digits is enough, the 8-byte repesentation is compact and fixed, the arithmetic
operators are embedded and the computation done in an FPP in all except toy
computers.  Unfortunately, exactness cannot be controlled by applying a
system-wide precision in terms of decimal places.  To illustrate:

	My system may be handling currency values in units of Lira, Yen, or
	some other inconsiderate currency with a very small basic denomination,
	and need to go up to quantities of 10^9 or more.  I may also have some
	stock kept in bulk, and need to account for quantities down to 4
	decimal places.  This total range is pushing the limits, but no one
	figure needs that total range of precision - I never account for less
	than 1 Yen, and I'm not going to have a billion tons of bulk stock.

Andrew Koenig appreciated this problem; as he said:

	Floating point arithmetic is *hard*.

Precision is dealt with in the appropriate places by rounding, but even this
causes a problem.  I notice that no one tried to provide a solution to the
second, and in many ways more subtle, problem in my original posting.  Given
that the supposed exact but unrepresentable value of .1455 has been arrived at
by two different means where the actual values are incrementally above and
below, and we simply want to round the result to 3 decimal places:  simplistic
rounding will actually _increase_ the discrepancy by yielding .145 and .146
respectively.  Comparison of the values before rounding would indicate them
equal within 15 significant digits.  After rounding, they show a difference in
the 3rd digit.  Accountants find this sort of thing difficult to grasp!
I actually have a general solution for this, but it involves sprintf() and some
sneaky string manipulation, and is hardly elegant.


Thanks for the ideas, they have provided food for thought.

I shall be on holiday for the next 3 weeks;  if you have any Earth-shattering
solutions which haven't come up so far, could you e-mail me please?

-- 
Rick Jones					You gotta stand for something
Tetra Ltd.  Maidenhead, Berks			Or you'll fall for anything
rick at tetrauk.uucp (...!ukc!tetrauk.uucp!rick)	     - John Cougar Mellencamp



More information about the Comp.lang.c mailing list