C Floating point arithmetic

Eugene D. Brooks III brooks at lll-crg.ARpA
Fri Nov 29 11:16:53 AEST 1985


>analyst should make the determination.  I have seen enough
>computations that produced total garbage to make me believe
>that the naive user should get double precision by default.
>
>I am in favor of supporting low-precision floating point in
>C, as permitted by X3J11, but let's not make it the default.

This argument is hard to swallow. You are suggesting protection
for the user by not giving him what he has asked for in his
code.  Any intelligent scientific programmer trys it both ways
on his own and determines whether single precision is adequate
(without the help of a numerical analyst, an analyst is consulted
when one wants to understand the root of the problem in an attempt
to rearrange the computation so that single precision is sufficient
should it fail).  There are those who do not take adequate care in
their work but I see no need to save them from themselves by default.

If the user says "float" then he wants float! I have really gotten
tired of carrying along my own compiler which does single precision
arithmetic on floats in order to use C in numerical computation.
I have also gotten tired of trying to defend the use of C as an
efficient numerical language when people constantly complain about
this problem for floating point computation.

C should not promote floats to doubles in expressions or arguments
for the same reasons that it does not promote ints to longs.  Its
not efficient.  This wart was injected into the language as the result
of the nature of the FP11 hardware on the PDP11.  The promotion of chars
and shorts to ints for arguments and expressions has its origin in the
same hardware, the PDP11 sign extended chars when they were loaded into
registers.  The promotion of char and short to int is not a severe
issue (I once spent some time trying to get a 68000 to do
multiplies of shorts using the available instructions instead
of promoting to 32bit ints and them using subroutines),
as most hardware has alignment restrictions on the stack, at least
for the sake of efficiency.

This is not true for float and double.  The loss in performance caused
by the spurious conversions to double is a serious issue.  Ask any
scientist who routinely runs 20 cpu hour jobs on a Vax whether he would rather
them run in 10 hours.  He will be glad to do a run both ways to look for
precision problems before moving on to that 100 hour run.



More information about the Comp.lang.c mailing list