A study in code optimization in C

Wed Aug 1 06:28:24 AEST 1990

[In article <1990Jul28.203800.17258 at laguna.ccsf.caltech.edu>,
     bruce at seismo.gps.caltech.edu (Bruce Worden) writes ... ]

> In general, I'd say Richard's code does a pretty good job when moving int's,
> and also when compared to young machines (the BBN and the Meiko i860.)
> In addition, his code is about 20% faster than a simple "for" loop on my
> Sparc 1+, so it illustrates a useful principle as well.  I intend to
> use it in some selected applications, thanks for posting it.

Bruce is one of the few people who seems to have seen the point -- which
(to me, anyway) was just an illustration of C coding technique, not a
claim that it's possible to beat Brand X compiler's mondo-optimized
assembler memcpy().

For collectors of useless numbers, here are results from an 8-megaHertz
16-bit Motorola 68000 (Atari ST), 1000 iterations, 20K buffers:

library memcpy:           17.125 seconds
Richard's gencpy char:    40.755 seconds
Richard's gencpy int:     20.385 seconds
Richard's gencpy long:    15.460 seconds

Details: Sozobon C compiler, dLibs public-domain C library.
Optimizer turned on. sizeof(int) == 16 bits; sizeof(long) == 32 bits.

The dLibs memcpy is coded in assembler, moves 16-bit words when possible,
and DOES check for overlaps (as in memmove). The copy is a simple loop.
Loading and dumping registers with movem.l might be faster; I have not tried.
-- 
   Steve Yelvington at the (rain-replenished) lake in Minnesota
   steve at thelake.mn.org