SGI GL matrix performance -- more benchmarks, this time on a PI

Sun Apr 28 14:31:38 AEST 1991

fritz is a 4D/50GT, with only an 8MHz CPU.  I recompiled using the -O flags
and ran my program on a few other machines.  We have a PI with GL version
3.3, which yielded results similar to your avogadro numbers (sorry - couldn't 
resist!), but with no large penalty in the hardware when going to -03.  I have
no idea what could have caused the disparity you recorded, unless you had some
other applications running which accessed the GL pipeline...

Here are some interesting results from our 4D/310VGX (33 MHz, 64MB RAM):

10000 iterations on yogi, with GL version: GL4DVGX-3.3

Compiler O level:                 -O0         -O1         -O2         -O3

Software - no optimization:     0.880 sec.  0.750 sec.  0.360 sec.  0.360 sec.

Software - some optimization:   0.350 sec.  0.250 sec.  0.210 sec.  0.200 sec.

Software - more optimization:   0.290 sec.  0.200 sec.  0.210 sec.  0.200 sec.

Hardware - preserve CTM:        2.700 sec.  2.700 sec.  2.690 sec.  2.690 sec.

Hardware - destroy CTM:         2.620 sec.  2.620 sec.  2.610 sec.  2.610 sec.

Hardware - abandon results:     0.430 sec.  0.440 sec.  0.450 sec.  0.430 sec.

Note that even my "slowest" implementation of a 4x4 multiply compiled with
a -O3 flag runs faster then the hardware.  And my "most optimized" version
runs twice as fast, even when compiled with -O1, the compiler default.  I'm 
using MIPS cc Version 2.00, running under IRIX 3.3.2 on the VGX (the other 
SGIs here run 3.3.1). 

Although these results seem to favor a software implementation on a machine
with a fast enough CPU, I have a feeling that if you threw in a few other 
processes and some NFS file I/O, the hardware would probably end up back 
on top.  It's good food for thought, though.

Jamie

Texas A&M University
Visualization Laboratory
jamie at archone.tamu.edu
Newsgroups: comp.sys.sgi
Subject: Re: SGI GL matrix performance -- more benchmarks, this time on a PI
References: <1991Apr27.204135.18538 at cunixf.cc.columbia.edu>
Organization: College of Architecture, Texas A&M University
Keywords: 

fritz is a 4D/50GT, with only an 8MHz CPU.  I recompiled using the -O flags
and ran my program on a few other machines.  We have a PI with GL version
3.3, which yielded results similar to your avogadro numbers (sorry - couldn't 
resist!), but with no large penalty in the hardware when going to -03.  I have
no idea what could have caused the disparity you recorded, unless you had some
other applications running which accessed the GL pipeline...

Here are some interesting results from our 4D/310VGX (33 MHz, 64MB RAM):

10000 iterations on yogi, with GL version: GL4DVGX-3.3

Compiler O level:                 -O0         -O1         -O2         -O3

Software - no optimization:     0.880 sec.  0.750 sec.  0.360 sec.  0.360 sec.

Software - some optimization:   0.350 sec.  0.250 sec.  0.210 sec.  0.200 sec.

Software - more optimization:   0.290 sec.  0.200 sec.  0.210 sec.  0.200 sec.

Hardware - preserve CTM:        2.700 sec.  2.700 sec.  2.690 sec.  2.690 sec.

Hardware - destroy CTM:         2.620 sec.  2.620 sec.  2.610 sec.  2.610 sec.

Hardware - abandon results:     0.430 sec.  0.440 sec.  0.450 sec.  0.430 sec.

Note that even my "slowest" implementation of a 4x4 multiply compiled with
a -O3 flag runs faster then the hardware.  And my "most optimized" version
runs twice as fast, even when compiled with -O1, the compiler default.  I'm 
using MIPS cc Version 2.00, running under IRIX 3.3.2 on the VGX (the other 
SGIs here run 3.3.1). 

Although these results seem to favor a software implementation on a machine
with a fast enough CPU, I have a feeling that if you threw in a few other 
processes and some NFS file I/O, the hardware would probably end up back 
on top.  It's good food for thought, though.

Jamie

Texas A&M University
Visualization Laboratory
jamie at archone.tamu.edu