RS/6000 Model 320 FP Performance
Stephen Linam
sdl at adagio.austin.ibm.com
Fri Nov 2 02:41:30 AEST 1990
In article <1990Oct31.233855.1371 at ux1.cso.uiuc.edu>,
bowman at uiatma.atmos.uiuc.edu writes:
|> In article <MCCALPIN.90Oct31170825 at pereland.cms.udel.edu>
mccalpin at perelandra.cms.udel.edu (John D. McCalpin) writes:
|> >
|> >Ooops, there must have been some typo in my code. I extracted the
|> >code from the tech report again and got the following absolutely
|> >phenomenal results!
|> >
|> > IBM RS/6000 Model 320 Matrix Multiply Performance
|> > Matrix Order Time per MM MFLOPS
|> > 32 .002 29.789
|> > 64 .019 27.594
|> .
|> .
|> .
|>
|> The value of tailoring the algorithms to the architecture is apparent. Is
|> anyone, including IBM, planning or willing to produce a library of basic
|> linear algebra subroutines that are optimized for the 6000? Think of the
|> clock cycles that would be saved!
Yes. Look for /lib/libblas.a. In the initial release dgemm, sgemm, dgemv
and sgemv are optimized. In the update announced last Tuesday the
library will
be refreshed with 22 single and double precision routines tuned. The
tuned routines are [sd]gemv, [sd]trmv, [sd]trsv, [sd]gemm, [sd]symm,
[sd]ger, [sd]trmm, [sd]trsm, [sd]syrk, [sd]axpy and i[sd]amax.
Search for 'blas' in info for documentation on the routines. The interfaces
are the same as the LAPACK blas. The library includes the full set of
blas routines, however, only the ones listed above have been optimized.
--------------------------------------------------------------------
Stephen Linam AWD Austin T/L: 793-3674 Bell-net: (512) 832-3674
IBM Internet: sdl at adagio.austin.ibm.com VNET: LINAM at AUSTIN
UUCP: ...!cs.utexas.edu:ibmchs!auschs!adagio.austin.ibm.com!sdl
More information about the Comp.unix.aix
mailing list