Integer Multiply/Divide on Sparc

Robert D. Silverman bs at linus.UUCP
Sat Dec 30 00:18:05 AEST 1989


In article <15418 at vlsisj.VLSI.COM> davidc at vlsisj.UUCP (David Chapman) writes:
:In article <84768 at linus.UUCP> bs at linus.mitre.org (Robert D. Silverman) writes:
:>Does any have, of know of software for the SPARC [SUN-4] that will
:>perform the following:
:>
:> [standard multiply and divide]
:>
:>The SPARC is brain dead [as were its designers] when it comes to doing
:>integer arithmetic. It can't multiply and it can't divide.
:
:There should be instructions on the order of "multiply step" and "divide 
:step", each of which will do one of the 32 adds/subtracts and then shift.  
 
There is a multiply step instruction. There is no such support for division.
It can take 200+ cycles to do a division on the SPARC [worst case]. 
A 32 x 32 bit unsigned multiply takes 45-47 cycles. Programs that have a
significant number of multiplies and divides can run SLOWER on a SPARC
than on a SUN-3. [I have such!]  ONLY because of the slow multiply/divides.

:I'm not particularly fond of the SPARC architecture (don't like register 
:windows), but this is a theoretical viewpoint and is not based on any 
:direct exposure to assembly-language programming for it (translation:
:sorry, I can't give you any more help).
:
:Neither SPARC nor its designers were brain-dead when it was built.  It's just
 
I didn't say they were. I said they were with respect to arithmetic. I stand
by that assertion. Most programs may not need multiply/divide in hardware.
However, for those that do require it, not having it is a real KILLER 
of algorithms.

:that it is difficult to get multiplication and division (especially the 
:latter) to run in 1 or 2 clock cycles.  All instructions are supposed to
 
I know of quite a few DSP chips that do multiplies in 1 cycles. Divides
take a little longer [but not much; Ernie Brickell of SANDIA invented a
hardware divide that works much faster than standard conditional-shift/
subtract].

:execute in the ALU in 1 cycle; if the multiply and divide instructions take
:more time then the front of the processor pipeline has to be able to stall
:and this added complexity will slow down the entire processor.
:
:Thus they provide you with the tools to do your own multiply and divide.  
 
See above. They are too slow.

:One of the benefits is that a compiler can optimize small multiplies and 
:divides to make them execute quicker (i.e. multiply by 10 takes 4 steps 
 
That's fine for multiply-by-constant. Most programs that NEED multiply/divide
are multiplying variables.

:P.S.  Don't write a loop on the order of "MULSTEP, DEC, BNZ" or it will be
:      incredibly slow.  Unroll the loop 4 or 8 times (MULSTEP, MULSTEP,
:      MULSTEP, MULSTEP, SUB 4, BNZ).  Branches are expensive.
 
Agreed. In fact my 32 x 32 bit multiply consists of 32 calls to multstep
and no looping at all. It is still slow.

-- 
Bob Silverman
#include <std.disclaimer>
Internet: bs at linus.mitre.org; UUCP: {decvax,philabs}!linus!bs
Mitre Corporation, Bedford, MA 01730



More information about the Comp.lang.c mailing list