Optimisation for Speed of Execution
Optimisations Carried Out by FPT
Four optimisations are available:
In-line Expansion
There is an overhead in calling a subroutine or invoking a function, but sometimes more importantly, sub-program calls may interfere seriously with compiler optimisations. fpt carries out in-line expansion of subroutines and functions chosen by the user. The effects of the optimisations depend on the host architecture. There are trade-offs between processor speed, memory speed and instruction cache size. Run times are typically improved by 20% to 40%, and by as much as a factor of two or more on high performance processors inhibited by memory speed.
Compilers will usually in-line internal sub-programs, so the same optimisation can be achieved by the insertion of contained INCLUDE files described below, with far less disturbance to the code.
Please see an example of in-line expansion here.
In-line expansion may be carried out quickly and easily, and it is recommended that users maintain code in unexpanded form. fpt may therefore be used to resolve a common conflict in the maintenance of large programs. It is good practice to encapsulate small, frequently used operations, such as accesses to data structures, in subroutines or functions. Maintenance changes are then made in only one place. However, a large number of sub-program calls slow down program execution. Users may choose to encapsulate primitive operations and to expand the calls in-line before the release of production code.
Please see EXPAND INLINE in the reference manual for a description of the fpt commands.
Unwinding Loops
Unwinding (or un-rolling) loops optimises the code by:
Compilers will do this automatically for small loops, but the fpt commands provide the user with control over the process.
As with in-line expansion, we recommend maintaining the code with the loops in-place and unwinding the loops only for a production release.
Please see an example of loop unwinding here.
Scalarisation
When code is scalarised, all arrays are replaced by lists of scalar variables. All loops referencing the arrays are unwound. This is therefore an extreme case of loop unwinding. However, the complete removal of array references may open more opportunities for code optimisation on parallel systems.
Scalarisation was not originally intended as an optimisation tool. The authors were working on a project to map Fortran code to a fine-grained parallel machine architecture, and scalarisation was a necessary step in transforming the code. The scalarised codes were compiled and run to test that nothing had changed, and we noticed significant speed improvements. The file-grained parallel system is descrived in Farrimond B.T., Collins J. and Sharma A. 2008, "APPRASE: Automatic parallelisation of Fortran to run on an FPGA" Paper presented at The Summer Conference of the Society for Computer Simulation, Edinburgh, Scotland, June 2008. and Wyngaard J, Inggs M, Collins J and Farrimond B, 2013, "Towards a many-core architecture for HPC", 23rd International Conference on Field Programmable Logic and Applications (FPL)
Please see SCALARISE in the fpt reference manual.
Service Routines in Contained INCLUDE Files
Many large codes have significant numbers of small service routines. Common practice is to contain them in modules. However, the code of the routines is not visible to the compiler when the sub-programs which use the modules are compiled. They are not, therefore, expanded inline.
The strategy recommended is to place the service routines in INCLUDE files. At present this must be done manually. fpt can then write a CONTAINS statement in every top-level routine and every module which does not have one, and insert the INCLUDE file with the service routines in the CONTAINS sections. The services then become internal sub-programs. Compilers can expand the small service routines in-line wherever they are called.
This strategy was used in fpt, which is itself written in Fortran and has about 500 service routines. The result was a 30% improvement in run speed.
Please see INSERT CONTAINED INCLUDE in the fpt reference manual.
Copyright ©1995 to 2025 Software Validation Ltd. All rights reserved.