Another s*y question (cost of scaling and code diddling)

Sun Apr 30 04:05:43 AEST 1989

Looking into this, I discovered at least one counter-intuitive result,
so this might be worth reading even if you've grown tired of the thread.

In article <1266 at l.cc.purdue.edu> cik at l.cc.purdue.edu (Herman Rubin) writes:
>Now suppose I am doing some serious array operations, and I have to know
>whether one array buffer is longer than another.  The elements are of type
>long.  Do I have to do this multiplying and dividing by 4 all the time?
>Another example of "user-friendly" which turns out to be "user-inimical."

Many people have jumped on this, but it does point to one of the
fundamental C mind sets, which turns out to have some exceptions:

	"For heavy array operations, it is usually faster to
	use pointers than array indexing."
			- me.

That is because the scaling operation is only done once, when the
pointer is incremented, rather than at each reference to the
array[index].  Specifically, where a and b are arrays of non-char types:

	for( ap = a, bp = b; i--;  ) /* even I will use a comma op here! */
		*ap++ = *bp++;

Is likely to be noticably faster than the functionally equivalent:

	while( i-- )
		a[i] = b[i];

While a hyper-optimising compiler _might_ manage to write code with
equivalent performance, we all believe that to rarely be the case.

Now, when looking into this, I wrote some test programs just to make
sure I wasn't lying.  These were compiled with gcc 1.34 -O, the most
optimising compiler I have handy.

	# define LEN 10
	int a[LEN], b[LEN];
	void f1()
	{
		register int i = LEN;
		register int *ap, *bp;
		for( ap = a, bp = b; i--; )
			*ap++ = *bp++;
	}
	void f2()
	{
		register int i = LEN;
		while( i-- )
			a[i] = b[i];
	}

The scaling operation done in the loop costs two extra instructions. 

    pointer:
	L%5:			<< one instruction loop body
		mov.l (%a0)+,(%a1)+	<< move word + bump pointers
	L%2:
		dbra %d0,L%5

    array:
	L%9:			<< three instruction loop body
		mov.l %d1,%d0			<< grab index
		asl.l &2,%d0			<< scale
		mov.l (%a0,%d0.l),(%a1,%d0.l)	<< then move word
	L%7:
		dbra %d1,L%9

What came as a surprise to me was that pointer operations may NOT
be faster than arrays for use with char types.  Changing the int arrays
to char arrays gives the following single instructions for the loop body.

   pointer:
		mov.b (%a0)+,(%a1)+

   array:
		mov.b (%a0,%d0.l),(%a1,%d0.l)

No scaling is involved, so it comes down to the price of using an
indexed move vs. an unindexed move with two pointer increments.  This is
heaviliy in the realm of architecture and hardware implementation.  One
can imagine machines where pointers are faster, and others where the
reverse was true.

Thus, the maxim becomes modified:

	"For heavy array operations, it is usually faster to
	use pointers than array indexing.

	"If you are really trying to squeeze out the last cycle,
	you ought to try it both ways and measure the result because
	what is good on Brand X may be bad on Brand Y."

			- me.

Few really large programs will be affected by this sort of thing.
Clarity of expression may make arrays preferable, particularly if a
published algorithm you are stealing is written that way.  Given a large
enough search space, a poorly written hash or tree will still be a lot
faster than a great register pointer based linear search.

But in quite a lot of smaller programs this sort of thing can have a
significant affect.  For instance, the recently posted "splay compress"
functions can be speeded up 50% by changing arrays to pointers.

-dB
-- 
"An expert is someone who's right 75% of the time"
{sun,mtxinu,amdahl,hoptoad}!rtech!gonzo!daveb	daveb at gonzo.uucp