Debugging Sensitivity to C math library on Linux



C math library functions, such as exp or sin, are not guaranteed to be “precise”. The results might be slightly different on different platforms. A recent change in mingw-w64 v12, which is a core dependency of compilers we use on Windows (both GCC and LLVM), resulted in failures in about 20 CRAN packages. Rtools45 uses mingw-w64 v11, the same version as Rtools44, to allow time for resolving the issues. The change in mingw-w64 v12 switched about 90 math functions from internal implementations to UCRT, the Windows C runtime provided by Microsoft and shipped with the OS. The goal of the change was improving performance and reducing maintenance cost of mingw-w64.

In principle, this is not only a problem on Windows. It is not hard to come across R package tests, which test that numerical results of some computation is within bounds, when the bounds were apparently established experimentally on several platforms at hand at the time the tests were written. These tests then commonly start failing on a new CPU or with a new compiler, and they may be difficult to debug, especially to a person who is not a maintainer of the package.

Section 1.6 Writing portable packages of Writing R Extensions has more details in th paragraph stating with:

Only test the accuracy of results if you have done a formal error analysis. Things such as checking that probabilities numerically sum to one are silly: numerical tests should always have a tolerance. That the tests on your platform achieve a particular tolerance says little about other platforms. R is configured by default to make use

This blog post presents the mfuzz tool for Linux, which fuzzes the results from C Math functions. So if establishing tolerance intervals empirically, despite the suggestion cited above, one might use the tool to get somewhat wider intervals. As of this writing, the tool fuzzes the results deterministically, adding 1 ULP (unit of least precision) towards positive infinity, but if that were to change the sign of the result, then away from zero.

The tool tries to be reasonable about fuzzing and tries to preserve certain properties which would likely hold on real C Math library implementations and which hold in Mathematics, such as that sin(0) == 0, cos(0) == 1 (exactly), etc. In principle, rounding functions are also C Math library functions, but they are indeed not fuzzed - one would normally only fuzz transcendental functions (via environment variable MFUZZ_FUZZ=trans). Making the tool sufficiently reasonable in what is fuzzed is joint work with Martin Maechler from R Core.

In principle, the tool could be extended to make the fuzzing somewhat random, or bigger for some functions and some intervals. The 1 ULP fuzzing has been motivated by that all over-sensitivity bugs found in R CRAN packages when debugging issues with mingw-w64 v12 were caused by differences in numerical results of only 1 ULP.

Interested readers are referred to previous blog posts on the topic fromJanuary from April and May.

Wrapping Math library calls

The tool (mfuzz) implements configurable wrappers most of the C Math library functions. These are built into a shared library, which is then provided to the dynamic linker via LD_PRELOAD and is used in precedence of the system C Math library. The wrappers in wrapppers.c are generated by an R script genwrappers.r, so it is best to modify the script when changes are needed.

Unlike the mdebug tool, which is for Windows, one doesn’t have to rebuild R to use mfuzz, which is for Linux.

Example

Let’s have this R test for a demonstration how the tool can be used:

x <- -1e4:1e4
y <- sin(x)^2 + cos(x)^2
stopifnot(max(1-y) <= .Machine$double.eps)

This is an example of an unreasonable test to be in an R package, because it tests the R math functions only without any relevant added value/computation, and because the tolerance limit is arbitrary. Still, the test passes on my Linux, Windows 11 and macOS systems, but it fails on Linux with mfuzz:

env LD_PRELOAD=~/mfuzz/libmfuzz.so MFUZZ_FUZZ=trans ./bin/R
[...]
> x <- -1e4:1e4
y <- sin(x)^2 + cos(x)^2
stopifnot(max(1-y) <= .Machine$double.eps)
Error: max(1 - y) <= .Machine$double.eps is not TRUE
> 

Lets check which Math functions were called and fuzzed in the execution:

env LD_PRELOAD=~/mfuzz/libmfuzz.so MFUZZ_FUZZ=trans MFUZZ_LOG=trans MFUZZ_LOGFILE=mylog.log ./bin/R

The log file looks as follows:

$ head mylog.log
log10 ( 0.000000(0x1p-1022) ) -> -307.652656(-0x1.33a7146f72a41p+8) [originally -307.652656(-0x1.33a7146f72a42p+8)]
log10 ( 60.000000(0x1.ep+5) ) -> 1.778151(0x1.c734eb9bbd4p+0) [originally 1.778151(0x1.c734eb9bbd3ffp+0)]
log ( 6.000000(0x1.8p+2) ) -> 1.791759(0x1.cab0bfa2a2003p+0) [originally 1.791759(0x1.cab0bfa2a2002p+0)]
log10 ( 3.500000(0x1.cp+1) ) -> 0.544068(0x1.1690163290f41p-1) [originally 0.544068(0x1.1690163290f4p-1)]
log10 ( 22.500000(0x1.68p+4) ) -> 1.352183(0x1.5a28a22d82ep+0) [originally 1.352183(0x1.5a28a22d82dffp+0)]
sin ( 10000.000000(0x1.388p+13) ) -> -0.305614(-0x1.38f2fa75d9288p-2) [originally -0.305614(-0x1.38f2fa75d9289p-2)]
sin ( 9999.000000(0x1.3878p+13) ) -> 0.636087(0x1.45ad3086449bep-1) [originally 0.636087(0x1.45ad3086449bdp-1)]
sin ( 9998.000000(0x1.387p+13) ) -> 0.992973(0x1.fc66f13ab2e0dp-1) [originally 0.992973(0x1.fc66f13ab2e0cp-1)]
sin ( 9997.000000(0x1.3868p+13) ) -> 0.436924(0x1.bf6909b0793f7p-2) [originally 0.436924(0x1.bf6909b0793f6p-2)]
sin ( 9996.000000(0x1.386p+13) ) -> -0.520831(-0x1.0aaa510fa1ffep-1) [originally -0.520831(-0x1.0aaa510fa1fffp-1)]

So lets check which Math functions were called and how many times:

$ cut -d' ' -f1 mylog.log | sort | uniq -c | sort -n
      1 log
      4 pow
      6 log10
  20001 cos
  20001 sin

The output is not very surprising. One can easily check via MFUZZ_FUZZ=sin and MFUZZ_FUZZ=cos that fuzzing only one of the functions is indeed enough to make the test fail.

Summary

This tool should help package authors who empirically set tolerances in their tests do some preventative maintenance. If the package normally passes its tests, but fails with mfuzz, the tolerances should be increased correspondingly. It is not guaranteed that doing this is enough to prevent any numerical issues with a new C Math library on a new platform, because the differences may be bigger than what mfuzz tries. Likewise, it is not guaranteed it would be enough for the same code to work on Windows with mingw-w64 v12 or newer.

To test and debug a package with tests of numerical results, it is better to use mdebug. One can also install an experimental build of Rtools45, which uses mingw-w64 v13, and test with it, but this has to be done with care to avoid polluting the package library on the system with code generated by an experimenral build of Rtools. One would have to rebuild R from source and all dependencies of the package under test. One would have to do that in a fresh library, and then delete the library after the experiment, and uninstall this Rtools45. Still, this experimental build confirmed that most CRAN packages that help back the upgrade to mingw-w64 v12 in Rtools45 due to over-sensitivity to C Math function accurracy, have been fixed.

Probably Rtools46 will use mingw-w64 v13, so the testing will be possible after Rtools46 release, but the testing will also become mandatory, as R-devel (and then R 4.6) will likely use Rtools46.

The mfuzz tool also helped to discover some tolerances in base R that needed to be adjusted.

This work also helped to make it clear that if R wants to provide more reasonable behavior of R math functions than no guarantees on any properties, it needs to implement such properties on its own.

This work has been possible thanks to an investment of the Sovereign Tech Fund.