When a program crashes or rus incorrectly it is often possible to use an interactive debugger to
see bad data at the site of the crash. It is sometimes difficult to find where the bad data has come from.
The FPT run-time trace facility is a technology for tracing arithmentic errors.
FPT instruments selected components of the code to capture all left-hand-side
scalar quantities to file.
Please click here for an example program which
models a cannon ball with square-law drag.
This code is modified by FPT to capture the outputs of every statement.
Click here for the modified code.
trace_start_sub_program logs the start of a program,
subroutine or function. The FPT library routines
trace_i4_data etc. write the left-hand-side
data to file. These routines are self-initialising - the log file is created on the
first call if it does not already exist.
The output shows entries to sub-programs, and the left-hand-side quantities,
written one to each line. Click here to see the trace output.
The main loop, starting with the copy of
p_hddot and ending with the computation of the new
x can clearly be seen.
If a program reads or generates bad data, it is usually possible to trace
back to the cause of the problem. The output files can become very large, so we
recommend that only a small sub-set of the files in a large program should be
instrumented in this way.
Please see the FPT command-line reference page for a description of the
procedure for using this facility.
The same program may produce significantly different results when it is built with different compilers or with different levels of
optimisation under the same compiler. The differences may be due only to numerical drift. This occurs when different systems choose
different orders of execution, or different variables to store in processor registers, with the result that there are small differences
in rounding errors. These differences integrate and eventually affect the results. However, the differences may also be due to compiler
bugs or to coding errors which behave differently in different environments.
The WinFPT run-time trace facility, and the library of support routines distributed with WinFPT, are used to analyse this issue.
Suppose that we wish to compare runs under two compilers, for example, gfortran and ifort. We want to know whether differences between
the runs are due to coding errors or just to numerical drift. The procedure is as follows:
The run-time trace files also record a unique index which identifies each trace routine call. These indices are used to detect the
situation where the two program runs fiollow different paths. If this occurs, the second run terminates at once, with a report of
the point at which the two runs diverge.
- The program is instrumented to capture a run-time trace.
- It is built under ifort and run. A trace file is generated.
- It is built under gfortran.
It would now be possible to run the program again and compare the two trace files. This is usually not
practical. The trace files drift apart because of numerical drift, and any differences due, for example, to coding errors are
hidden amongst the large number of differences due to drift. Instead:
- In the second run, under gfortran, the same subroutines which captured the trace of the first run read the trace file and compare
every value computed by gfortran with the value computed by ifort. If the values are the same, no action is taken. If the values
differ by more than a criterion amount, the difference is reported. The values computed in the second run are then overwritten
by the values from the first run. This prevents the accumulation of numerical drift so that the runs do not drift apart.
This technique has proved to be very powerful in detecting:
- Uninitialised variables
- Array references out-of-bounds
- Ill-defined orders of execution
- Compiler bugs
The detailed behaviour in the second, comparison run may be refined by writing an optional configuration file.
This file specifies:
- The critieria for comparing real numbers. Two criteria are specified, a relative criterion difference and an absolute criterio
difference. By default, the relative criterion difference is 1% and the absolute difference is 0.0001. A real number is reported
as different if the difference exceeds both criteria. The requirement for an absolute criterion difference prevents the report of spurious
differences when values are close to zero.
- Whether integer and logical values are to be overwritten when differences are detected. Some programs use integers to store
file and database handles which are always different on different runs. If these are overwritten, the file or database handling
- The location of the trace file. These files may become very large and it may be necessary to store them on external devices.
Copyright ©1995 to 2013 Software Validation Ltd. All rights reserved.