Efficient use of lighting models

Tue Oct 30 07:43:05 AEST 1990

+  5520	Efficient use of lighting models             
XBR2D96D at DDATHD21.BITNETIn article <9010250912.aa15161 at VGR.BRL.MIL>,
XBR2D96D at DDATHD21.BITNET (Knobi der Rechnerschrat) writes:
|> ...
|>   Using a 70/GT (and other models) we know that multi-colored surfaces
|> (every vertex using a different material) are slower than uni-colored
|> surfaces (all verteces using the same material). Our current algorithm
|> for the multi-colored stuff looks alike:
|> 
|>   bgntmesh;
|>   for all vertices {
|>     if(newmat != oldmat) { lmbind(MATERIAL,newmat);oldmat=newmat;}
|>     /* the oldmat/newmat stuff is just to avoid unneccessary lmbind's */
|>     n3f(normals);
|>     v3f(coordinates);
|>     }
|>   endtmesh;
|> 
|>  The question arises wether it is faster to use just one material and
|> change its properties using lmcolor/cpack ...

The answer is yes, it is and will continue to be faster to change a single
material property by using lmcolor mode than by changing materials.

|>  This of course would allow to change only one property (i.e DIFFUSE)
|> in this loop (two for the case of LMC_AD). If you HAVE TO change more
|> properties at once (and can't simulate that by changing one property)
|> I believe you have to insert lmcolor commands inside the loop. As I
|> understand Kurt, this is not desirable. I'm really interested in an
|> answer to this problem, as it eventually would force a desing-
|> decission for our software. We have also observed that the speed
|> differences between uni/multi-colored surfaces (using the first
|> algorithm) vary dramatically when using different graphics platforms.
|> Especially the VGX seems to have problems on that algorithm. Is this
|> observation true and is there an answer to this problem that covers
|> all SGI graphics machines (PI, PI/TG, GT, GTX, VGX)?

The intention of lmcolor is to allow the graphics system to expect a
LIMITED amount of extra data per vertex to modify the material properties.
The point is that we limit the data volume, not the complexity of the
requested material change.  If your algorithm really requires uncorrelated
change to multiple material properties, then rebinding materials is the
way to go, and will probably never be fast.  If the changes are related,
perhaps a new lmcolor mode should be defined.  I'd like to hear back on
this, though perhaps not over the net.

|>  A second question arises for the LMC_AD mode. How does it work? Does
|> it set AMBIENT and DIFFUSE to the same RGB-values?

Yes.  And ALPHA is set to the alpha value too.

In the case where graphics performance is traversal limited a performance
increase could result.  On the VGX, for example, lighted vertexes could be
sent 10 to 20 percent faster with such a command.  We felt that this
improvement
was not worth the required user recoding effort.  Also, much real code
is either
transform limited (by lmcolor, for example) or fill limited, and would
not benefit from such an optimization.  We will continue to consider more
efficient interface commands, however.

|>  Finally I've got a question concerning the memory alignment of normal
|> and coordinate data. From the release notes I know that for the GTX
|> this kind of data has to be quadword aligned to get best performance.
|> We are currently allocating normals and coordinates in onedimensional
|> arrays (float) of this form:
|> 
|>    x0,y0,z0,x1,y1,z1,....
|> 
|> and pass the adress of the xn-element to n3f/v3f. Should we better
|> allocate additional (dummy)-space for the w-elements to get the
|> best performance on the gtx? As this would mean 33% more memory usage
|> for vertices and normals, we like to avoid it if possible. How large
|> is the performance loss if one does not use quadword aligned data.
|> What are the effects on other machines (esp. VGX)?

Here's the facts, make of them what you will.  Vertex data are transferred
to GTX and VGX graphics systems using special 3-way operations (see other
SGI publications for explanation).  A 3-way transfer takes 10 bus clocks to
complete if its data are quad-word aligned, 14 bus clocks otherwise.  Since
the bus clock is always 16 MHz, this translates into 1.6 million aligned
transfers per second, and 1.15 million unaligned transfers per second.
If both transform and fill limits support a call rate that is greater than
1.15 million calls per second (counting each c, n, v, and t call) then
quad-word alignment will improve performance.  This situation is common on
well tuned VGX code, somewhat less common on GTX code.

-- kurt