Optimal for loop on the 68020.

Chris Torek chris at mimsy.UUCP
Tue Jun 6 06:11:24 AEST 1989


In article <11993 at well.UUCP> pokey at well.UUCP (Jef Poskanzer) writes:
>... COUNT was a small (< 127) compile-time constant.
>    for ( i = COUNT; --i >= 0; )

[all but gcc -O -fstrength-reduce deleted]

>	moveq  #COUNT,d0
>	jra    tag2
>tag1:
>	<loop body>
>tag2:
>	dbra   d0,tag1
>	clrw   d0
>	subql  #1,d0
>	jcc    tag1

>... But wait!  What's that chud after the loop?  Let's see, clear d1
>to zero, subtract one from it giving -1 and setting carry, and jump
>if carry is clear.  Hmm, looks like a three-instruction no-op to me!

No---the problem is that `dbra' decrements a *word*, compares the
result against -1, and (if not -1) braches.  The semantics of the
loop demands a 32 bit comparison.  The only reason it is not necessary
in this particular case is the first quoted line above.

Still, it would be nice if gcc always used the dbra/clrw/subql/jcc
sequence for `--x >= 0' loops, since it does always work.  The `clrw'
fixes up the case where the 16-bit result has gone to -1:

	before decrement:	wxyz 0000
	after decrement:	wxyz FFFF
	after clrw:		wxyz 0000
	after subql:	      wxyz-1 FFFF

The dbra loop is so much faster that the extra time and space for one
`unnecessary' dbra+clrw (when the loop really does go from 0 to -1,
and at every 65536 trips when the loop counter is large and positive)
that I would make this optimisation unconditional.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.unix.wizards mailing list