Cray Autotasking

Tue Nov 28 05:52:54 AEST 1989

In article <MCCALPIN.89Nov27080806 at masig3.ocean.fsu.edu> 
mccalpin at masig3.ocean.fsu.edu (John D. McCalpin) writes:
> Does anyone know offhand how the Cray autotasking splits loops between
> processors?

I have not looked at autotasking lately, but I assume it works much like 
microtasking did.

The allocation of tasks to processors is dynamic and therefore cannot be 
statically determined.  You must assume they will be allocated in any 
order to any number of available processors.  (Welcome to the wonderful 
non-determinant world of parallel processing.)

The most efficient way to autotask is to have many tasks of vectorized 
loops.  The efficiency of vectorization increases with the vector length.  
The efficiency of load balancing the tasks increases with the number of 
tasks.  Lengthening the vectors, decreases the number of tasks.  
Optimizing vectorization is usually more important than optimizing 
autotasking.

Breaking the problem into 8 tasks to match 8 CPUs will work well in 
benchmark or dedicated situations, but will be counter-productive in a 
mixed job environment.  You will probably get less than the full 
compliment of 8 CPUs, leaving straggling long vectors to process at the 
end.

In general, you split the loop into tasks of vectors of length 64 to 128.  
This provides fairly optimum vector lengths while still maximizing the 
number of tasks.

If autotasking is up-to-snuff, you would present it with one long loop, 
and it would vectorize and parallelize that one loop for you, choosing a 
fairly optimum distribution.

Does anyone know if autotasking will split one loop into multiple tasks of 
vectorized loops yet?  My assumptions is autotasking works like 
microtasking.  Is this assumption valid?

Kent Koeninger   Cray Evangelist    Apple Computer   <koeninger at apple.com>