Random rounding is a numerical technique

- to estimate the loss of numerical stability due to floating point arithmetic in arbitrary programs (provided you have the source code)
- to locate causes of numerical instabilities in the code
- to check the impact of isolated sub-units of the program on the final calculation results

We presented our version of the technique in

G. Knizia, W. Li, S. Simon, H.-J. Werner; J. Chem. Theory Comput. 7 2387 (2011)

with applications to the Molpro quantum chemistry package, a numerical program with several million lines of mixed Fortran 77, Fortran 90, C and C++ code. We were able to obtain accurate error bars on final numbers calculated by numerical programs consisting of 10 000s of lines of code.The meta-compiler script we described can be found below. The following text gives a short overview over the technique. More information and references to related work of other authors can be found in the article.

Contents:

When a FP operation is executed, then in general
the exact result of the computation cannot be represented in a FP number,
since FP numbers have limited resolution. The result of the computation
thus needs to be

Under unfortunate conditions, rounding errors can accumulate and amplify
in non-transparent ways and lead to

Let a = 1.4142136 b = -1.4142136 c = 1.2345678 * 10^{-5}Then c_{1}= (a + b) + c = 1.2345678 * 10^{-5}c_{2}= a + (b + c) = 1.2397766 * 10^{-5}

While here it is quite obvious what happened, in large programs
such errors can occur burried deep in nested procedures, and without
any indication that the final result is inaccurate (you just get a number, but no information on how many of its digits are significant).
Note that all effective subtractions of large FP numbers resulting in a small result lead to relative accuracy loss.
Also, often such problems show up only in large calculations which
cannot be easily debugged by hand, or when uncommon calculation
parameters are used.
In general it is very hard to find out how accurate calculated FP numbers are.

By default the FPU will round results of floating point operations to the nearest representable floating point number. Random rounding analysis works by changing that behavior: Before each FP operation, a small code segment is executed which changes the rounding mode to UP or DOWN randomly. Due to this, an artificial numerical noise is induced, which is of the same order of magnitude as the inherent FP rounding error, but in a random direction. Since this is done for each FP operation, a proper numerical error propagation is emulated.

Subsequently, multiple runs of the entire program will give different
results. These results can be analyzed statistically to estimate
the effects of

EYou can evaluate the mean result and root mean square deviation as_{1}= 0.99999997 E_{2}= 1.00000656 E_{3}= 1.00000134 E_{4}= 1.00000670 E_{5}= 0.99999066 E_{6}= 0.99999421 E_{7}= 0.99999345 E_{8}= 1.00000280

mean(E) = 0.999999459745 rms(E) = 5.687e-06and conclude that your final calculation results is in reality

E = (0.999 999 5 +/- 0.000 011 4) (double confidence interval)(i.e., there are only -log

E = 1.000 003 129 723 ...as a single straight run of the program (without random rounding) might want to make you believe. More information on the technique, the background, and the pitfalls of the technique can be found in the article.

The f95rr script implements random rounding
^{16} FPU control
words (integers which control how to FPU works in subsequent operations;
in particular, its calculation precision and its rounding mode). This table is also
added to the compiled program. This table can be changed at run time
by function calls to obtain a variety of behaviors (see below).

Concretely, the script can be invoked as a Fortran compiler or C++ compiler and will then do the following:

- Interpret the command line. Invoke the actual compiler
(e.g., g95, gfortan or g++) to compile the input files
to assembly code (*additionally, it adds compiler flags for producing x87 FPU code instead of SSE code) - Scan the assembly code for the program entry point. If a program entry point is found, insert code for initializing the rounding mode table (LC_CWDATA).
- Scan the assembly code for x87 FPU instructions which
require a change of the rounding mode. These are all
addition/subtraction and multiplication/division operations.
Before each such operation, the following assembly code
is inserted:
xchg rax, [LC_CWINDEX] lea ax, [rax+1] fldcw word ptr [LC_CWDATA+2*rax] xchg rax, [LC_CWINDEX]

This code first increments the current rounding mode table index (LC_CWINDEX) and then loads a new floating point control word to the FPU. This control word is taken from the 2^{16}-entry rounding mode table table ([LC_CWDATA]) at the current index. The last xchg writes the new table index back and restores the original register rax value. All instructions in this assembly fragment preserve the CPU flags register and do not affect other code (apart from the changed rounding mode of course). - Invoke the actual compiler again to compile the assembly files to binary object code and to link the executable

- Open the script file f95rr.py and adjust the paths to the actual compiler you intend to use. The script has been tested with g95, gfortran, and g++ as main compilers. By default it assumes the compiler executables are /usr/bin/g++, /usr/bin/gcc, and /usr/bin/gfortran.
- Put the f95rr directory as top level in your path variable. On bash this can be achieved with
export PATH=$HOME/programs/f95rr/:$PATH

and on csh withsetenv PATH $HOME/programs/f95rr/:$PATH

(instead of $HOME/programs/f95rr/ put in the actual location to where the files were unpacked)

- Make symbolic links g95->f95rr.py, c++->f95rr.py etc. in the f95rr.py directory. Such links are included in the .tar.gz file, but you might need to adjust them if they are not correctly unpacked.

program bla implicit double precision (a-h,o-z) f = 0.0 do i = 10000, 1, -1 x = dble(i) f = f + x*dsin(x) end do print "(1x,A,2x,F20.15)", "final sum:", f end programRunning the compiler script should produce the following output:

~/dev/f95rr> gfortran example.F90; a.out * Setting default rounding; Random rounding PRNG: * JKISS [D. Jones, 'Good practice in (pseudo) random number generation..'] final sum: 7186.139201355133991This is the default output of the program (exact double precision result), apart from the first line, which is emitted by cw_control.c and can be disabled if this is desired. It produces the same number as before (although more slowly..) because the rounding mode table is initialized to 64bit, round-to-nearest. If we now want to measure the numerical stability of this process, we need to enable random rounding for the computation:

program bla implicit double precision (a-h,o-z)If the program is now recompiled (just as before) and executed multiple times, it will give different results:! from here on rounding mode table is double precision, random rounding. call set_random_rounding_64bit() f = 0.0 do i = 10000, 1, -1 x = dble(i) f = f + x*dsin(x) end do! from here on rounding mode is back to normal (double precision, round to nearest) call set_default_rounding() print "(1x,A,2x,F20.15)", "final sum:", f end program

~/dev/f95rr> gfortran example.F90; a.out * Setting default rounding; Random rounding PRNG: * JKISS [D. Jones, 'Good practice in (pseudo) random number generation..'] final sum: 7186.139201355161276 ~/dev/f95rr> a.out final sum: 7186.139201355102159 ~/dev/f95rr> a.out final sum: 7186.139201355098521 ~/dev/f95rr> a.out final sum: 7186.139201355199475 ~/dev/f95rr> a.out final sum: 7186.139201355158548 ~/dev/f95rr> a.out final sum: 7186.139201355080331 ~/dev/f95rr> a.out final sum: 7186.139201355074874This scattering can then be analyzed statistically to get a measure of the numerical stability (we recommend using root mean square scattering numbers).

- BLAS/LAPACK: If your program uses BLAS or LAPACK and you want to include the those linear algebra routines in your numerical analysis (you probably want to, at least for BLAS), then those libraries also need to be recompiled using the f95rr script. For this task we recommend downloading the reference BLAS and LAPACK from http://www.netlib.org and compiling it with random rounding.

If you run your program compiled with f95rr and you instantly get a floating point exception, then most likely the script was unable to find the program entry point while building the program, and thus did not insert the call to initialize the rounding mode table to sensible values. If this happens, you need to call the function "initialize_cw_table" manually somewhere at the start of your program (before floating point operations are executed).Floating Point Exceptions:

See chapter "Pitfalls of random rounding analysis" in in the articleRandom rounding mean value not within +/- 2*rms of the round-to-nearest result: Linking problems: If you get unresolved symbols "init_cw_table__" or similar, open the f95rr.py file and set "second_underscore" to True.

If the program fails to do linking at all:

This may require logic readjustments in f95rr. Parsing the compiler commands 100% perfectly is not easy due to the creative gcc syntax.

Sorry, the injected code is not thread safe at this moment.Does not work in multithreaded code:

Please address feedback to Gerald Knizia via knizia(at)theochem.uni-stuttgart.de. I'll try to help to get the script to work if I can. It's still experimental code and may take some tweaking in order to get working (but that is most likely still much easier than other forms of numerical stability analysis!).

I would greatly appreciate it if you would occasionally slip in a citation for the article if you used the f95rr script in your scholary work and found it useful.

-- C. Gerald Knizia, 2011. Last Update: 14. Jul. 2011