SimCon logo

SimCon - Fortran Analysis, Engineering & Migration

  About UsDownloadsPurchaseHome

 
 
 
 
 
WinFPT - Relative Debugging and Tracing Program Execution
 
Tracing Execution

When a program crashes or rus incorrectly it is often possible to use an interactive debugger to see bad data at the site of the crash. It is sometimes difficult to find where the bad data has come from. The FPT run-time trace facility is a technology for tracing arithmentic errors.

FPT instruments selected components of the code to capture all left-hand-side scalar quantities to file. Please click here for an example program which models a cannon ball with square-law drag.

This code is modified by FPT to capture the outputs of every statement. Click here for the modified code.

The routine trace_start_sub_program logs the start of a program, subroutine or function. The FPT library routines trace_r4_data , trace_i4_data etc. write the left-hand-side data to file. These routines are self-initialising - the log file is created on the first call if it does not already exist.

The output shows entries to sub-programs, and the left-hand-side quantities, written one to each line. Click here to see the trace output. The main loop, starting with the copy of hddot to p_hddot and ending with the computation of the new value of x can clearly be seen.

If a program reads or generates bad data, it is usually possible to trace back to the cause of the problem. The output files can become very large, so we recommend that only a small sub-set of the files in a large program should be instrumented in this way.

Please see the FPT command-line reference page for a description of the procedure for using this facility.

 
Relative Debugging - Removing Numerical Drift

The same program may produce significantly different results when it is built with different compilers or with different levels of optimisation under the same compiler. The differences may be due only to numerical drift. This occurs when different systems choose different orders of execution, or different variables to store in processor registers, with the result that there are small differences in rounding errors. These differences integrate and eventually affect the results. However, the differences may also be due to compiler bugs or to coding errors which behave differently in different environments.

The WinFPT run-time trace facility, and the library of support routines distributed with WinFPT, are used to analyse this issue. Suppose that we wish to compare runs under two compilers, for example, gfortran and ifort. We want to know whether differences between the runs are due to coding errors or just to numerical drift. The procedure is as follows:

  • The program is instrumented to capture a run-time trace.
  • It is built under ifort and run. A trace file is generated.
  • It is built under gfortran.

    It would now be possible to run the program again and compare the two trace files. This is usually not practical. The trace files drift apart because of numerical drift, and any differences due, for example, to coding errors are hidden amongst the large number of differences due to drift. Instead:

  • In the second run, under gfortran, the same subroutines which captured the trace of the first run read the trace file and compare every value computed by gfortran with the value computed by ifort. If the values are the same, no action is taken. If the values differ by more than a criterion amount, the difference is reported. The values computed in the second run are then overwritten by the values from the first run. This prevents the accumulation of numerical drift so that the runs do not drift apart.
The run-time trace files also record a unique index which identifies each trace routine call. These indices are used to detect the situation where the two program runs fiollow different paths. If this occurs, the second run terminates at once, with a report of the point at which the two runs diverge.

This technique has proved to be very powerful in detecting:

  • Uninitialised variables
  • Array references out-of-bounds
  • Ill-defined orders of execution
  • Compiler bugs
 
Relative Debugging - Refining the Comparisons

The detailed behaviour in the second, comparison run may be refined by writing an optional configuration file. This file specifies:
  • The critieria for comparing real numbers. Two criteria are specified, a relative criterion difference and an absolute criterio difference. By default, the relative criterion difference is 1% and the absolute difference is 0.0001. A real number is reported as different if the difference exceeds both criteria. The requirement for an absolute criterion difference prevents the report of spurious differences when values are close to zero.
  • Whether integer and logical values are to be overwritten when differences are detected. Some programs use integers to store file and database handles which are always different on different runs. If these are overwritten, the file or database handling may fail.
  • The location of the trace file. These files may become very large and it may be necessary to store them on external devices.

Copyright ©1995 to 2013 Software Validation Ltd. All rights reserved.