Random Rounding Numerical Stability Analysis

Random rounding is a numerical technique

to estimate the loss of numerical stability due to floating point arithmetic in arbitrary programs (provided you have the source code)
to locate causes of numerical instabilities in the code
to check the impact of isolated sub-units of the program on the final calculation results

We presented our version of the technique in

G. Knizia, W. Li, S. Simon, H.-J. Werner; J. Chem. Theory Comput. 7 2387 (2011)

with applications to the Molpro quantum chemistry package, a numerical program with several million lines of mixed Fortran 77, Fortran 90, C and C++ code. We were able to obtain accurate error bars on final numbers calculated by numerical programs consisting of 10 000s of lines of code.

The meta-compiler script we described can be found below. The following text gives a short overview over the technique. More information and references to related work of other authors can be found in the article.

Contents:

1. Intro to floating point (FP) rounding

2. Intro to random rounding analysis

3. What the f95rr script does (randrom rounding meta-compiler)

4. Download f95rr script

5. Usage of f95rr script

6. Feedback

1. Intro to floating point (FP) rounding

When a FP operation is executed, then in general the exact result of the computation cannot be represented in a FP number, since FP numbers have limited resolution. The result of the computation thus needs to be rounded to a representable number. By default the processor rounds to the nearest FP number, but it can also to be told to round up (use as result the smallest FP number which is at least as large as the exact result) or down (use as result the largest FP number which is at most as large as the exact result).

Under unfortunate conditions, rounding errors can accumulate and amplify in non-transparent ways and lead to very inaccurate final calculation results. They also lead to FP arithmetic violating many familiar mathematical identities, like associativity laws. For example, let us assume 32bit floating point artihmetic. Then:

   Let
      a =  1.4142136
      b = -1.4142136
      c =  1.2345678 * 10^-5
   Then
      c₁ = (a + b) + c = 1.2345678 * 10^-5
      c₂ = a + (b + c) = 1.2397766 * 10^-5

While here it is quite obvious what happened, in large programs such errors can occur burried deep in nested procedures, and without any indication that the final result is inaccurate (you just get a number, but no information on how many of its digits are significant). Note that all effective subtractions of large FP numbers resulting in a small result lead to relative accuracy loss. Also, often such problems show up only in large calculations which cannot be easily debugged by hand, or when uncommon calculation parameters are used. In general it is very hard to find out how accurate calculated FP numbers are. Our technique allows putting error bars on calculated FP numbers, by simply re-compiling the program and running it multiple times. Even if the final number is the result of 100 000s of lines of FP computations!

2. Intro to random rounding analysis

By default the FPU will round results of floating point operations to the nearest representable floating point number. Random rounding analysis works by changing that behavior: Before each FP operation, a small code segment is executed which changes the rounding mode to UP or DOWN randomly. Due to this, an artificial numerical noise is induced, which is of the same order of magnitude as the inherent FP rounding error, but in a random direction. Since this is done for each FP operation, a proper numerical error propagation is emulated.

Subsequently, multiple runs of the entire program will give different results. These results can be analyzed statistically to estimate the effects of normal rounding on the final calculation results. For example, if you run an calculation with random rounding which gives in eight identical runs the following eight results:

  E₁ = 0.99999997
  E₂ = 1.00000656
  E₃ = 1.00000134
  E₄ = 1.00000670
  E₅ = 0.99999066
  E₆ = 0.99999421
  E₇ = 0.99999345
  E₈ = 1.00000280

You can evaluate the mean result and root mean square deviation as

   mean(E) = 0.999999459745
   rms(E) =  5.687e-06

and conclude that your final calculation results is in reality

   E = (0.999 999 5 +/- 0.000 011 4)    (double confidence interval)

(i.e., there are only -log₁₀(2*5.687e-06)=4.9 significant digits) and not, for example,

   E = 1.000 003 129 723 ...

as a single straight run of the program (without random rounding) might want to make you believe. More information on the technique, the background, and the pitfalls of the technique can be found in the article.

3. What the f95rr script does

The f95rr script implements random rounding by acting as as meta-compiler and inserting the random rounding code fragments (see below) into the compiled code before FP operations. The code fragments refer to a table of 2¹⁶ FPU control words (integers which control how to FPU works in subsequent operations; in particular, its calculation precision and its rounding mode). This table is also added to the compiled program. This table can be changed at run time by function calls to obtain a variety of behaviors (see below).

Concretely, the script can be invoked as a Fortran compiler or C++ compiler and will then do the following:

Interpret the command line. Invoke the actual compiler (e.g., g95, gfortan or g++) to compile the input files to assembly code (*additionally, it adds compiler flags for producing x87 FPU code instead of SSE code)
Scan the assembly code for the program entry point. If a program entry point is found, insert code for initializing the rounding mode table (LC_CWDATA).
Scan the assembly code for x87 FPU instructions which require a change of the rounding mode. These are all addition/subtraction and multiplication/division operations. Before each such operation, the following assembly code is inserted:
```
   xchg rax, [LC_CWINDEX]
   lea ax, [rax+1]
   fldcw word ptr [LC_CWDATA+2*rax]
   xchg rax, [LC_CWINDEX]
```
This code first increments the current rounding mode table index (LC_CWINDEX) and then loads a new floating point control word to the FPU. This control word is taken from the 2¹⁶-entry rounding mode table table ([LC_CWDATA]) at the current index. The last xchg writes the new table index back and restores the original register rax value. All instructions in this assembly fragment preserve the CPU flags register and do not affect other code (apart from the changed rounding mode of course).
Invoke the actual compiler again to compile the assembly files to binary object code and to link the executable

4. Download f95rr script

click here (version: 2011-07-14)

5. Usage of f95rr script

Before running the script:

Open the script file f95rr.py and adjust the paths to the actual compiler you intend to use. The script has been tested with g95, gfortran, and g++ as main compilers. By default it assumes the compiler executables are /usr/bin/g++, /usr/bin/gcc, and /usr/bin/gfortran.
Put the f95rr directory as top level in your path variable. On bash this can be achieved with
```
export PATH=$HOME/programs/f95rr/:$PATH
```
and on csh with
```
setenv PATH $HOME/programs/f95rr/:$PATH
```
(instead of $HOME/programs/f95rr/ put in the actual location to where the files were unpacked)

Other things you might need to do:

Make symbolic links g95->f95rr.py, c++->f95rr.py etc. in the f95rr.py directory. Such links are included in the .tar.gz file, but you might need to adjust them if they are not correctly unpacked.

Re-compiling your program:

If the script is in your top-level PATH variable, invoking g++ or gfortran in some other directory should execute the f95rr.py script. We provided two example programs (example.F90 and example.cpp) to demonstrate the intended usage of the program. For example, let us consider 'example.F90':

program bla
   implicit double precision (a-h,o-z)
   f = 0.0
   do i = 10000, 1, -1
      x = dble(i)
      f = f + x*dsin(x)
   end do
   print "(1x,A,2x,F20.15)", "final sum:", f
end program

Running the compiler script should produce the following output:

~/dev/f95rr> gfortran example.F90; a.out
* Setting default rounding; Random rounding PRNG: * JKISS [D. Jones, 'Good practice in (pseudo) random number generation..']
 final sum:  7186.139201355133991

This is the default output of the program (exact double precision result), apart from the first line, which is emitted by cw_control.c and can be disabled if this is desired. It produces the same number as before (although more slowly..) because the rounding mode table is initialized to 64bit, round-to-nearest. If we now want to measure the numerical stability of this process, we need to enable random rounding for the computation:

program bla
   implicit double precision (a-h,o-z)
   ! from here on rounding mode table is double precision, random rounding.
   call set_random_rounding_64bit()
   f = 0.0
   do i = 10000, 1, -1
      x = dble(i)
      f = f + x*dsin(x)
   end do
   ! from here on rounding mode is back to normal (double precision, round to nearest)
   call set_default_rounding()
   print "(1x,A,2x,F20.15)", "final sum:", f
end program

If the program is now recompiled (just as before) and executed multiple times, it will give different results:

~/dev/f95rr> gfortran example.F90; a.out
* Setting default rounding; Random rounding PRNG: * JKISS [D. Jones, 'Good practice in (pseudo) random number generation..']
 final sum:  7186.139201355161276
~/dev/f95rr> a.out
 final sum:  7186.139201355102159
~/dev/f95rr> a.out
 final sum:  7186.139201355098521
~/dev/f95rr> a.out
 final sum:  7186.139201355199475
~/dev/f95rr> a.out
 final sum:  7186.139201355158548
~/dev/f95rr> a.out
 final sum:  7186.139201355080331
~/dev/f95rr> a.out
 final sum:  7186.139201355074874

This scattering can then be analyzed statistically to get a measure of the numerical stability (we recommend using root mean square scattering numbers).

Notes:

BLAS/LAPACK: If your program uses BLAS or LAPACK and you want to include the those linear algebra routines in your numerical analysis (you probably want to, at least for BLAS), then those libraries also need to be recompiled using the f95rr script. For this task we recommend downloading the reference BLAS and LAPACK from http://www.netlib.org and compiling it with random rounding.

Some common problems:

Floating Point Exceptions:
If you run your program compiled with f95rr and you instantly get a floating point exception, then most likely the script was unable to find the program entry point while building the program, and thus did not insert the call to initialize the rounding mode table to sensible values. If this happens, you need to call the function "initialize_cw_table" manually somewhere at the start of your program (before floating point operations are executed).
Random rounding mean value not within +/- 2*rms of the round-to-nearest result:
See chapter "Pitfalls of random rounding analysis" in in the article
Linking problems:

If you get unresolved symbols "init_cw_table__" or similar, open the f95rr.py file and set "second_underscore" to True.

If the program fails to do linking at all:
This may require logic readjustments in f95rr. Parsing the compiler commands 100% perfectly is not easy due to the creative gcc syntax.
Does not work in multithreaded code:
Sorry, the injected code is not thread safe at this moment.

6. Feedback

Please address feedback to Gerald Knizia via knizia(at)theochem.uni-stuttgart.de. I'll try to help to get the script to work if I can. It's still experimental code and may take some tweaking in order to get working (but that is most likely still much easier than other forms of numerical stability analysis!).

I would greatly appreciate it if you would occasionally slip in a citation for the article if you used the f95rr script in your scholary work and found it useful.

-- C. Gerald Knizia, 2011. Last Update: 14. Jul. 2011