A note for those not consumed by efficiency worries (was: function calls)

Thu Mar 22 03:48:50 AEST 1990

/ hpcupt1:comp.lang.c / jlg at lambda.UUCP (Jim Giles) /  3:07 pm  Mar 20, 1990 /
>From article <29551 at amdcad.AMD.COM>, by tim at nucleus.amd.com (Tim Olson):
> [...]
> It might be true that scientific routines written in FORTRAN may have
> this many live, non-overlapping variables to keep in registers, but I
> don't believe this is true in general.  Statistics from a large
> collection of programs and library routines (a mix of general and
> scientific applications written in C) show that of 782 functions (620
> of which were non-leaf functions), an average of 6.5 registers per
> function were live across function calls.

This statistic can only be interpreted in one way: the C compiler in
question didn't allocate registers very well.  Especially in scientific
packages, there are _HUGE_ numbers of 'live' _VALUES_ to deal with during
execution of even simple routines.  Vectors, arrays, lists, strings, etc,
are alle being either produced or consumed.  The fact that none of these
_VALUES_ were in registers at the time of the call indicates one of two
things: 1) the code in question was fragmented to the point that most
procedures had only a few data items (and scalar at that), or 2) the
compiler simply wasn't using the registers to anywhere near their potential.
Since you imply the code was well written, I reject the first explanation.
That leaves the compiler.

My experience (I don't have statistics) with both Fortran and C is that
good compilers generally PACK the registers with as much live data as
possible.  Even an apparently pure scalar loop that does only 'simple'
operations may be 'unrolled' a few times to make better use of the
registers.  Compilers are becoming available that apply that optimization
automatically (so this isn't just a case which applies only to 'coder
enhanced' code).  If such a loop (and this is still the simple kind
mind you) had a procedure call imbedded in it, all the registers that
the procedure might use would have to be spilled to memory - and then
reloaded on return.  On the Cray, spilling and reloading just _ONE_
vector register is over 150 clocks (64 elements at one clock each
to and from memory plus time for the memory pipeline plus a little
overhead to set up stride, address, etc.).  This is _not_ a tiny
problem.

Again I say, this scheme of having 'preserved' vs. 'temp' registers
for procedure calls only _appears_ to save time.  In truth, it deprives
you of registers which could be put to use (at least if your compiler
was clever enough).  The only solution to the problem is to 'inline'
(or, at least, schedule registers globally).  At present, in most
environments, the only way to do this is by manually 'inlining' the
procedures.  Let's hope that more automatic solutions become generally
available in the next few years.

By the way, aside from a problem with aliasing with C, Fortran and C are
identical with respect to optimization, register allocation, etc..
So, your implied put-down of Fortran is not relevant in this context.

J. Giles
----------