Cray Autotasking
R. Kent Koeninger
koeninger at apple.com
Tue Nov 28 05:52:54 AEST 1989
In article <MCCALPIN.89Nov27080806 at masig3.ocean.fsu.edu>
mccalpin at masig3.ocean.fsu.edu (John D. McCalpin) writes:
> Does anyone know offhand how the Cray autotasking splits loops between
> processors?
I have not looked at autotasking lately, but I assume it works much like
microtasking did.
The allocation of tasks to processors is dynamic and therefore cannot be
statically determined. You must assume they will be allocated in any
order to any number of available processors. (Welcome to the wonderful
non-determinant world of parallel processing.)
The most efficient way to autotask is to have many tasks of vectorized
loops. The efficiency of vectorization increases with the vector length.
The efficiency of load balancing the tasks increases with the number of
tasks. Lengthening the vectors, decreases the number of tasks.
Optimizing vectorization is usually more important than optimizing
autotasking.
Breaking the problem into 8 tasks to match 8 CPUs will work well in
benchmark or dedicated situations, but will be counter-productive in a
mixed job environment. You will probably get less than the full
compliment of 8 CPUs, leaving straggling long vectors to process at the
end.
In general, you split the loop into tasks of vectors of length 64 to 128.
This provides fairly optimum vector lengths while still maximizing the
number of tasks.
If autotasking is up-to-snuff, you would present it with one long loop,
and it would vectorize and parallelize that one loop for you, choosing a
fairly optimum distribution.
Does anyone know if autotasking will split one loop into multiple tasks of
vectorized loops yet? My assumptions is autotasking works like
microtasking. Is this assumption valid?
Kent Koeninger Cray Evangelist Apple Computer <koeninger at apple.com>
More information about the Comp.unix.cray
mailing list