Misc.

Thu Dec 22 03:44:35 AEST 1988

In article <8812200602.aa17057 at SMOKE.BRL.MIL>, XBR2D96D at DDATHD21.BITNET (Knobi der Rechnerschrat) writes:
> Hello Netlanders,
> 
>   I've a few questions about SGI's new GTX architecture. They are based
> on the 3.1 release notes and a document called "IRIS GTX: A Technical
> Report, Rev 2":
> 
> - which type of CPU (16 MHZ or 25 MHZ) and how many of them do I need
>   to get the full graphics speed (100.000 Z-buffered 4-sided, G-shaded,
>   P-lighted, independent polygons). I ask this question, because one of
>   SGI's competitors (they have a vector/parallel-oriented Workstation
>   with up to 4 CPU's, Graphics computations done in the CPU) had to admit
>   (after applying some spanish inqusition tools) that they need 4 CPU's
>   to reach their maximum graphics performance and  that there may exist
>   situations, where graphics can consume all resources of the system.

ALL GTX class machines can reach full graphics performance with a single
CPU driving the graphics.  In a 4-popper, this means you get >3 CPU's
of compute performance to use as you wish.  (Unlike the competition, a GTX
has 100 MFlops dedicated to graphics; the CPU performance is yours to use
or abuse as you wish).

Part of this is the result of a custom bus cycle and small block DMA facility
which the processor uses to send geometry to the pipeline.  We call this
feature the "3-way-transfer".  More below ...

> - Chapter "8.2 Graphics Notes" in the 4D-3.1 release notes states that
>   some of the graphics routines (c3*, c4*, n3f, v2*, v3*, v4*) should be
>   called with quadword-aligned data to get full GTX performance.
>   Does this mean all the variables have to be "double" (which I don't
>   beleave) or that the first byte of a "float x[3]" vector has to start
>   on a quadword-address? In the latter case I only have to rearrange our
>   data-structures.

As you surmised, the quadword alignment is just for the first byte of the
data structure you are sending.  The reason for doing this to get full
performance is related to the 3-way-transfer and the MP backplane.

As in most multiprocessors, memory data is transferred in large blocks for
efficiency, and then cached at each CPU.  The POWERSeries uses a 4-word
(16-byte) cache line, which is also the basic unit of transfer to the
graphics pipeline.  The 3-way-transfer is designed to allow the programmer
to lay out his data in an arbitrary way without alignment restrictions.
Thus, if your vertex crosses a 4-word boundary, two bus cycles will be
necessary to send the data (thus the "3-way": the first part of the data
may come from cache or memory, and the second part may come from some other
cache or memory, or the initiating CPU may own none of the data, in which
case other cache(s) or memory will supply the data). [Sorry if this is
confusing; remember that the POWERSeries uses write-back cacheing, so the
"real" memory image is distributed between caches and memory.]

Quad word aliging the vertex assures that the transfer happens in a single
bus cycle, giving you the best performance (but remember, your code will
still work, no matter how the data is aligned).

> - does shademodel(FLAT) work again under 3.1?

I hope so.

-- Jim Barton
Silicon Graphics Computing Systems    "UNIX: Live Free Or Die!"
jmb at sgi.sgi.com, sgi!jmb at decwrl.dec.com, ...{decwrl,sun}!sgi!jmb

  "I used to be disgusted, now I'm just amused."
			- Elvis Costello, 'Red Shoes'
--