Relative GL costs

Fri Apr 5 11:40:22 AEST 1991

In article <9104041941.AA29344 at ge-dab.GE.COM> "dwilliam at larry.ATL.GE.COM"@andrew.dnet.ge.com writes:
>"Howard C. Smith" <smith at nextone.niehs.nih.gov> writes:
>> 	Does anyone have numbers as to the relative cost of  
>> particular GL calls? (for each machine in the 4D series). Maybe all  
>> normalized as a percentage of gconfig (presumably the most  
>> expensive). 
>>
>> 	Howard Smith
>> 	smith at nextone.niehs.nih.gov
>> 
> 
>/* 
> * this might be what you are looking for.
> * let me know if you make any interesting enhancements.
> * compile with:
> *    	cc -prototypes -acpp -O -s glbench.c -lm -lgl_s -lc_s -o glbench
> *
> * dan (dwilliams at atl.ge.com)
> *
> * GL benchmarking results sorted numerically for a 210GTX:
> * 
> * swapbuffers                         :      61 calls per second

It's hard to derive a true cost for a GL routine when it involves
the hardware gfx pipeline.  Because the bottleneck can be deep in the
pipe and lots of FIFO-ing inbetween, pixie/prof results _can_ be
very misleading.

If you write a benchmark prg (like glbench.c) and run the same
primitive over and over, then you _should_ get a reasonable idea
of the cost of a particular primitive (as long as you do a finish()
to flush the pipe or do enough iterations that the depth of the pipe is 
insignificant).  

Unfortunately there are exceptions to the above.  Swapbuffers & gsync
wait for the next vertical retrace, so benchmarking them is difficult.
I do know they each make a system call, but the whole routine shouln't
take more than 100 usecs itself (leaving you 16.56... msec to draw at a
60hz framerate).

Also, benchmarking mapcolor on some machines is difficult due to the
way mapcolor was microcoded. Here are some real numbers for mapcolor
performance.

VGX: 	31750 slots/sec
GTX: 	7400
G: 	2200
PI:	4000       

The problem with these is that their inverse is _not_ the cost of
the routine on most machines because when inserted in a stream of
unrelated cmds (that happen not to tickle the same bit of hardware)
the cost may drop down to a usec or less.

On a dumb frame buffer most of this would be very easy because there
is only one processor, but on the VGX there can be 11, some in parallel,
some in series.

--
----------------------------------------------------------------------------
 Brian McClendon bam at rudedog.SGI.COM ...!uunet!sgi!rudedog!bam 415-335-1110
----------------------------------------------------------------------------