RS/6000 Model 320 FP Performance

Fri Nov 2 02:41:30 AEST 1990

In article <1990Oct31.233855.1371 at ux1.cso.uiuc.edu>,
bowman at uiatma.atmos.uiuc.edu writes:
|> In article <MCCALPIN.90Oct31170825 at pereland.cms.udel.edu>
mccalpin at perelandra.cms.udel.edu (John D. McCalpin) writes:
|> >
|> >Ooops, there must have been some typo in my code.  I extracted the
|> >code from the tech report again and got the following absolutely
|> >phenomenal results!
|> >
|> >	IBM RS/6000 Model 320 Matrix Multiply Performance
|> >	Matrix Order   Time per MM        MFLOPS
|> >	        32       .002             29.789
|> >	        64       .019             27.594
|>  .
|>  .
|>  .
|> 
|> The value of tailoring the algorithms to the architecture is apparent.  Is
|> anyone, including IBM, planning or willing to produce a library of basic
|> linear algebra subroutines that are optimized for the 6000?  Think of the
|> clock cycles that would be saved!

Yes.  Look for /lib/libblas.a.  In the initial release dgemm, sgemm, dgemv 
and sgemv are optimized.  In the update announced last Tuesday the
library will 
be refreshed with 22 single and double precision routines tuned.  The
tuned routines are [sd]gemv, [sd]trmv, [sd]trsv, [sd]gemm, [sd]symm,
[sd]ger, [sd]trmm, [sd]trsm, [sd]syrk, [sd]axpy and i[sd]amax.

Search for 'blas' in info for documentation on the routines.  The interfaces
are the same as the LAPACK blas.  The library includes the full set of
blas routines, however, only the ones listed above have been optimized.

--------------------------------------------------------------------
Stephen Linam   AWD Austin   T/L: 793-3674  Bell-net: (512) 832-3674
IBM Internet: sdl at adagio.austin.ibm.com        VNET: LINAM at AUSTIN
UUCP:  ...!cs.utexas.edu:ibmchs!auschs!adagio.austin.ibm.com!sdl