(none)

Tue May 9 02:47:27 AEST 1989

In article <890505182901.2cc14a47 at SCRI1.SCRI.FSU.EDU>, MCCALPIN at SCRI1.SCRI.FSU.EDU writes:
> We have a Power Series 120 machine here for a demo/loan, and
> I have had trouble getting anything useful out of the Power
> Fortran preprocessor.  The code is basically a whole bunch of double
> DO loops, with an iteration count of about 40 on the outer loop and
> 100 on the inner loops.
> 
> I ran the code with the following command:
> 	f77 -O2 test.f -o test
> 	time test
> and it took 16.3 seconds
> 
> I re-ran it with: f77 -pfa keep -O2 test.f -o test
> and it took 26.4 seconds!
> 
> I ran it again with: f77 -pfa keep -WK,-O=4,-UR=4 -O2 test.f -o test
> and it got back down to 16.2 seconds.
> 
> This code seems ideal for loop-splitting parallelization, and the
> intermediate code files show DOACROSS directives on all the important
> loops.
> 
> Anybody have any ideas of something I might be doing wrong?
> 

One thing that comes to mind is that you might have some initialization
loops, etc... that are parallelized, but don't have enough work in each
chunk to justify the synchronization overhead.  If you haven't already,
profile the single processor version (using both pc-sampling [-p] and 
pixie) and compare the results to the intermediate files generated by
pfa.  Look for loops that use 1% (or less) of the execution time of the
program being parallelized.  The next trick is to remove the unwanted 
doacross's from the .m and file and rename it as a .f and recompile like
so

f77 -mp -nocpp foo.f -O2 -o foo 

to generate a new parallelized executable.

Hope this helps,

Archer Sully
archer at sgi.com

"life is short, and full of stuff"

		-- Lux Interior
--