vectorization of array maximum

Fri Sep 28 07:39:20 AEST 1990

>From article <16279 at thorin.cs.unc.edu>, by cullip at vangogh.cs.unc.edu (Timothy Cullip):
> I'm reasonably new to the Cray and am trying to do something
> simple: find the maximum element in an array.

   You have picked a very interesting example to start off with.  The
code you submit is actually nearly optimal (asymptotically, it is
optimal, if compiled correctly) in the case of a randomly-ordered
array.  Note that it can easily be modified to return the location of
the maximum as well as the value by saving it when it occurs.

   The alternative is to use the Cray library routine ISMAX(3SCI).
This has both advantages and disadvantages over your routine.  The big
disadvantage is that it will be a factor of two-plus slower in the
randomly-ordered case (as I discovered the first time I tried to use
it in an application where I cared about speed).

   It uses a search algorithm which is a common paradigm for Cray
programming.  Generate a vector with the first 64 elements of your
array.  Then read the next 64 elements; compare each to the corres-
ponding element of your vector, and choose the larger.  Repeat until
your array is exhausted.  One of the 64 elements of your vector will
be the maximum.  Of course, you have to deal with fragments which are
not multiples of 64, and further it is not trivial to write efficient
high-level code for this algorithm (the actual library routine is
written in assembler).

   It takes only a little knowledge of Cray hardware to realize that
the above algorithm, when perfectly optimized, is about two times
slower than your algorithm, as long as you stay in your inner loop.
Your code just does one-plus chimes of work on each vector, setting
the vector mask and then testing it for nonzero (the memory fetch and
floating subtract should be overlapped or chained); the Cray code does
two-plus chimes, setting the vector mask and then always a vector
merge.  The "plus" is then to go back and find the maximum, which
requires looking again at every 64th element.  If you want to be
clever you can avoid a stride of 64 here either by setting the vector
length to 63 or (more complicated) by using a nonunit stride near n/64
in the first pass.

In the random-long-vector case, therefore, your code will be much
better.  The break out of the vector loop only occurs, on average,
about lg n times out of n/64 vector iterations, which is
asymptotically negligible.  The Cray code has the distinct advantage,
however, that its worst case is the same as its average case.  Your
worst case is incredibly bad; if the array happens to be monotonically
increasing, I would expect your code to be hundreds of times slower
than in the random case.  You may also pay more overhead than you like
when the array is fairly short.  (Solving this problem efficiently on
short arrays is difficult.  But in general you should almost never
attack a problem this way in a time-critical application with short
arrays, but rather find 64 maxima in parallel.)

It might make sense to merge the two algorithms by implementing the
Cray algorithm but with a check for zero-vector-mask.  This would mean
that in the case that none of the 64 array elements is greater than
the corresponding vector element (still "almost always" in the
asymptotic random case), that a single chime would suffice, while the
worst case would not be much worse than the the existing Cray code.
Also, of course, if you want only the value of the maximum and not its
location, a version of the Cray code without that feature would be
slightly faster (this is more significant on a Cray-2 than on an
[XY]-MP, because the stride-64 problem is greater).

   -- David desJardins