Summary on IEEE error handler for SUN FORTRAN (L
Zhang Yun Fei
zhang at buast7.bu.edu
Fri Jan 25 06:50:46 AEST 1991
This is a summary to the following question I posted to the net yesterday.
I got solutions/suggestions from the following netters.
khb at Eng.Sun.COM
borcherb at turing.cs.rpi.edu
mckie at sky.arc.nasa.gov
larry at pylos.cchem.berkeley.edu
carlo at nu.uchicago.edu
As it seems a rather popular problem among the number crunchers. I
summarize the answers in the rest of this message.
Thanks to all of you who kindly responded my question. I appriciate it.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Yun Fei Zhang % E-mail: %
% Astronomy Department % SPAN: east::"zhang at buast0.bu.edu" %
% Boston University % BITNET: zhang at buasta %
% 725 Commonwealth Ave. % INTnet: zhang at buast0.bu.edu %
% Boston, MA 02215 % zhang at bu-ast.bu.edu %
%--------------------------------------------------------------------------%
% TEL: (617)-353-8917 %
% TELEX: 95-1289 BOS UNIV BSN %
% TELEFAX: (617)-353-5704 %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-----------------------------------------------------------------------------
ORIGINAL QUESTION POSTED:
I have a question about the error handler of SUN FORTRAN. The question is
how to locate the location where an arithmetic error occurs in a program.
This is especially helpful as I am writing a computation-intensive code.
On VAX/VMS machines, the code will crash when it encounter these
arithmetic error and tell the user where it occures. However, on SUNs, it
only shows a message at the end of the job says something like the
following:
> Warning: the following IEEE floating-point arithmetic exceptions
> occurred in this program and were never cleared:
> Inexact; Division by Zero; Invalid Operand; .....
My question is how the determine the point(s) in the code where such
indicated arithmetic exceptions happened. Try to modifying the IEEE error
handler (e.g. sigfpe_ieee, etc.) seems a possible approach. But it
involves changes in these lower level routine, which I am reluctant to
try. Is there any other option I can have to archive the some goal? (e.g.,
compiletion/linking options or software tools).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
SOLUTIONS:
khb at Eng.Sun.COM pointed the correct direction for a solution in the first
respons received:
>From f77 code, as mentioned in the Numerical Computation Guide, and
the Fortran User's Guide
i = ieee_handler("set","common",%val(2)) ! aka SIGFPE_ABORT
! if you use the .h
! file with mathincludes
will cause execution to stop on divide by zero, operations on NaNs etc. If
you want to catch inexact, you can do that too (ask for it by name).
####
And borcherb at turing.cs.rpi.edu point out the followings:
Try man f77_ieee_environment for documentation on how to do this.
Unfortunately, I believe that on SUN-4's, the handler doesn't actually
report the address at which the exception happened. I'm told that this is
because of the pipelined nature of the SPARC processor. At any rate, I
was unable to get this working on a SUN-4. However, my code does stop the
program as soon as the SIGFPE signal is sent to it.
####
% The most pragmatic approach, I think, is from mckie at sky.arc.nasa.gov as
% he wrote:
On our Sun, there were so many people who had the same question as you
about how to find ieee errors in a fortran program that I set up a man
page to try to explain it. I'll include a part of that man page below.
It seems a bit strange in comparison to your vms experiences, but after
you've done it a few times, it's not too bad. And the ieee approach is
more flexible & more under your control.
-Bill McKie
NASA Ames Research Center
mckie at sky.arc.nasa.gov
=============================================================
SYNOPSIS
The following is an abbreviated description of how to use the Sun DBX
debugger to find where floating point errors are occurring in a fortran
program.
DESCRIPTION
Step 1.
Add the following statements to the program's main module:
external handler
call ieee_handler('set','common',handler)
The "external" statement is a declaration, and should appear
in the preliminary non-executable statements section of the
main program source code. The call to "ieee_handler" should
be placed into the main program as one of the first execut-
able statements.
Step 2.
Add the following "stub" subroutine to the main program
source code:
subroutine handler(i1,i2,i3,i4)
end
The subroutine name "handler" is arbitrary, but must be the
same in the subroutine statement, the external statement,
and the call ieee_handler statement's 3rd argument.
Step 3.
Compile the program as usual, but everywhere the f77 command
is used, include the "-g" option in the f77 command line.
E.g. if the program is entirely in the file prog.f (includ-
ing the handler subroutine), then the following could be
used:
f77 -g -o prog prog.f
Step 4.
The program is now ready to run, and it could simply be run
as usual using the "prog" command. However, to find the
place where a floating point error is occurring, the dbx
debugger utility is used to control execution of the pro-
gram. This is how it is run:
dbx
dbx> debug prog
dbx> catch FPE
dbx> run
signal FPE in <routine name> at line n in file <file_name>
dbx> quit
The "dbx>" are prompts from the debugger. The line follow-
ing the "run" command is output by the debugger, and is a
clue as to where the error occurred.
Step 5.
Edit the file <file_name> and move to line n to see where
the error was occurring.
SEE ALSO
dbx(1) dbxtool(1) f77(1)
LIMITATIONS
The above description demonstrates only a small subset of
the dbx debugger's capabilities. See the dbx user's manual
for more information on what dbx can do.
####
% larry at pylos.cchem.berkeley.edu contrbuted another way around as:
I'm not sure that I understand what you are asking, but here is the code
that I use to make the default behavior similar to what you describe - die
on divide by zero, etc. It requires the use of the C-preprocessor on a Sun
to make it a 'compile time option'.
#if ERROR && SUN
#include <f77/f77_floatingpoint.h>
external error_handler
integer ieeer,ieee_handler,error_handler
#endif
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
c set up the error handler to barf on all exceptions and then clean
c the inexact exceptions which occur with any floating point operation
c these should be the first exectuable line
in your code.
#if SUN && ERROR
ieeer=ieee_handler('set','all',error_handler)
ieeer=ieee_handler('clear','inexact',error_handler)
#endif
c a separate function
#if SUN && ERROR
c error handler that is called by the IEEE package on the sun
integer function error_handler(sig,code,sigcontext)
integer sig,code,sigcontext(5)
character label*16
if (loc(code).eq.212) label='overflow'
if (loc(code).eq.208) label='invalid'
if (loc(code).eq.204) label='underflow'
if (loc(code).eq.200) label='division'
if (loc(code).eq.196) label='inexact'
write(*,*) 'IEEE exception code ',loc(code),
2 ' ( ',label(1:lnblnk(label)),' ) occured at pc ',sigcontext(4)
c any error processing can be done here. I just choose to kill the
c program gracefully
call abort(' IEEE exception code - Program Halted')
stop
end
#endif
########
% carlo at nu.uchicago.edu shows me a similar wit, which can track the call
routine to the break point:
Hi. The following method is the one that I have had to resort to. It's a
bit of a kludge, but at least it allows identification of the routine
within which the exception occurred, and the problem can usually be
identified using dbx:
program foobawooba
external handler
common /debug1/nlevel
common /debug2/ stack(20)
character stack*25
data nlevel/1/stack/'main',19*''/
ieeer=ieee_handler('set','common',handler)
c 'common' handles invalid, overflow, and division exceptions --- see
c "man ieee_handler". handler is an external routine shown at the end
c of this example. This line will call handler whenever a 'common'
c exception occurs.
[ some code here]
call haha1
[ more code]
stop
end
subroutine haha1
common /debug1/nlevel
common /debug2/ stack(20)
.
call haha2
.
stack(nlevel)=''
nlevel=nlevel-1
return
end
subroutine haha2
common /debug1/nlevel
common /debug2/ stack(20)
character stack*25
nlevel=nlevel+1
stack(nlevel)='haha2'
.
stack(nlevel)=''
nlevel=nlevel-1
return
end
integer function handler ( sig, code, sigcontext )
common /debug1/nlevel
common /debug2/ stack(20)
character stack*25
integer sig
integer code
integer sigcontext(5)
write(6,*) 'Bomb! Here comes a stack dump:'
do 1 i=1,nlevel
write(6,*) stack(i)
1 continue
write(6,*) 'Number of levels:',nlevel
call abort
end
The effect of all these (admittedly ugly and machine specific) gymnastics
is that the routine in which the exception occurred is pinpointed by the
array 'stack' and the variable 'nlevel'. Since execution is halted by
means of 'call abort', all the debugging information is still available,
and the problem may be identified (if the debugger was active) by
examining the guilty routine. The ass paining part of all this is that
the lines affecting 'stack' and 'nlevel' must be included in every routine
in the program. I'm not that happy with it, but it's the best I've been
able to do. One might wish that it would dawn on the *%&#!!!? C
programmers who developed Fortran for the Sun that for the purposes of
scientific programming, failure to *automatically* halt on division by
zero is a bug, not a feature :-(> .
I hope this helps.
Carlo Graziani
#####
More information about the Comp.sys.sun
mailing list