C math library functions, such as exp or sin, are not guaranteed to be
“precise”. The results might be slightly different on different platforms.
A recent change in mingw-w64 v12, which is a core dependency of compilers we
use on Windows (both GCC and LLVM), resulted in failures in about 20 CRAN
packages. Rtools45 uses mingw-w64 v11, the same version as Rtools44, to
allow time for resolving the issues. The change in mingw-w64 v12 switched
about 90 math functions from internal implementations to UCRT, the Windows C
runtime provided by Microsoft and shipped with the OS. The goal of the
change was improving performance and reducing maintenance cost of mingw-w64.
In principle, this is not only a problem on Windows. It is not hard to come across R package tests, which test that numerical results of some computation is within bounds, when the bounds were apparently established experimentally on several platforms at hand at the time the tests were written. These tests then commonly start failing on a new CPU or with a new compiler, and they may be difficult to debug, especially to a person who is not a maintainer of the package.
Section 1.6 Writing portable packages of Writing R Extensions has more details in th paragraph stating with:
Only test the accuracy of results if you have done a formal error analysis. Things such as checking that probabilities numerically sum to one are silly: numerical tests should always have a tolerance. That the tests on your platform achieve a particular tolerance says little about other platforms. R is configured by default to make use
This blog post presents the mfuzz tool for Linux, which fuzzes the results from C Math functions. So if establishing tolerance intervals empirically, despite the suggestion cited above, one might use the tool to get somewhat wider intervals. As of this writing, the tool fuzzes the results deterministically, adding 1 ULP (unit of least precision) towards positive infinity, but if that were to change the sign of the result, then away from zero.
The tool tries to be reasonable about fuzzing and tries to preserve certain
properties which would likely hold on real C Math library implementations
and which hold in Mathematics, such as that sin(0) == 0, cos(0) == 1
(exactly), etc. In principle, rounding functions are also C Math library
functions, but they are indeed not fuzzed - one would normally only fuzz
transcendental functions (via environment variable MFUZZ_FUZZ=trans).
Making the tool sufficiently reasonable in what is fuzzed is joint work with
Martin Maechler from R Core.
In principle, the tool could be extended to make the fuzzing somewhat random, or bigger for some functions and some intervals. The 1 ULP fuzzing has been motivated by that all over-sensitivity bugs found in R CRAN packages when debugging issues with mingw-w64 v12 were caused by differences in numerical results of only 1 ULP.
Interested readers are referred to previous blog posts on the topic fromJanuary from April and May.
Wrapping Math library calls
The tool (mfuzz) implements
configurable wrappers most of the C Math library functions. These are built
into a shared library, which is then provided to the dynamic linker via
LD_PRELOAD and is used in precedence of the system C Math library. The
wrappers in wrapppers.c are generated by an R script genwrappers.r, so
it is best to modify the script when changes are needed.
Unlike the mdebug tool, which is for Windows, one doesn’t have to rebuild R to use mfuzz, which is for Linux.
Example
Let’s have this R test for a demonstration how the tool can be used:
x <- -1e4:1e4
y <- sin(x)^2 + cos(x)^2
stopifnot(max(1-y) <= .Machine$double.eps)
This is an example of an unreasonable test to be in an R package, because it
tests the R math functions only without any relevant added value/computation,
and because the tolerance limit is arbitrary. Still, the test passes on my
Linux, Windows 11 and macOS systems, but it fails on Linux with mfuzz:
env LD_PRELOAD=~/mfuzz/libmfuzz.so MFUZZ_FUZZ=trans ./bin/R
[...]
> x <- -1e4:1e4
y <- sin(x)^2 + cos(x)^2
stopifnot(max(1-y) <= .Machine$double.eps)
Error: max(1 - y) <= .Machine$double.eps is not TRUE
>
Lets check which Math functions were called and fuzzed in the execution:
env LD_PRELOAD=~/mfuzz/libmfuzz.so MFUZZ_FUZZ=trans MFUZZ_LOG=trans MFUZZ_LOGFILE=mylog.log ./bin/R
The log file looks as follows:
$ head mylog.log
log10 ( 0.000000(0x1p-1022) ) -> -307.652656(-0x1.33a7146f72a41p+8) [originally -307.652656(-0x1.33a7146f72a42p+8)]
log10 ( 60.000000(0x1.ep+5) ) -> 1.778151(0x1.c734eb9bbd4p+0) [originally 1.778151(0x1.c734eb9bbd3ffp+0)]
log ( 6.000000(0x1.8p+2) ) -> 1.791759(0x1.cab0bfa2a2003p+0) [originally 1.791759(0x1.cab0bfa2a2002p+0)]
log10 ( 3.500000(0x1.cp+1) ) -> 0.544068(0x1.1690163290f41p-1) [originally 0.544068(0x1.1690163290f4p-1)]
log10 ( 22.500000(0x1.68p+4) ) -> 1.352183(0x1.5a28a22d82ep+0) [originally 1.352183(0x1.5a28a22d82dffp+0)]
sin ( 10000.000000(0x1.388p+13) ) -> -0.305614(-0x1.38f2fa75d9288p-2) [originally -0.305614(-0x1.38f2fa75d9289p-2)]
sin ( 9999.000000(0x1.3878p+13) ) -> 0.636087(0x1.45ad3086449bep-1) [originally 0.636087(0x1.45ad3086449bdp-1)]
sin ( 9998.000000(0x1.387p+13) ) -> 0.992973(0x1.fc66f13ab2e0dp-1) [originally 0.992973(0x1.fc66f13ab2e0cp-1)]
sin ( 9997.000000(0x1.3868p+13) ) -> 0.436924(0x1.bf6909b0793f7p-2) [originally 0.436924(0x1.bf6909b0793f6p-2)]
sin ( 9996.000000(0x1.386p+13) ) -> -0.520831(-0x1.0aaa510fa1ffep-1) [originally -0.520831(-0x1.0aaa510fa1fffp-1)]
So lets check which Math functions were called and how many times:
$ cut -d' ' -f1 mylog.log | sort | uniq -c | sort -n
1 log
4 pow
6 log10
20001 cos
20001 sin
The output is not very surprising. One can easily check via MFUZZ_FUZZ=sin
and MFUZZ_FUZZ=cos that fuzzing only one of the functions is indeed enough
to make the test fail.
Summary
This tool should help package authors who empirically set tolerances in
their tests do some preventative maintenance. If the package normally
passes its tests, but fails with mfuzz, the tolerances should be increased
correspondingly. It is not guaranteed that doing this is enough to prevent
any numerical issues with a new C Math library on a new platform, because
the differences may be bigger than what mfuzz tries. Likewise, it is not
guaranteed it would be enough for the same code to work on Windows with
mingw-w64 v12 or newer.
To test and debug a package with tests of numerical results, it is better to use mdebug. One can also install an experimental build of Rtools45, which uses mingw-w64 v13, and test with it, but this has to be done with care to avoid polluting the package library on the system with code generated by an experimenral build of Rtools. One would have to rebuild R from source and all dependencies of the package under test. One would have to do that in a fresh library, and then delete the library after the experiment, and uninstall this Rtools45. Still, this experimental build confirmed that most CRAN packages that help back the upgrade to mingw-w64 v12 in Rtools45 due to over-sensitivity to C Math function accurracy, have been fixed.
Probably Rtools46 will use mingw-w64 v13, so the testing will be possible after Rtools46 release, but the testing will also become mandatory, as R-devel (and then R 4.6) will likely use Rtools46.
The mfuzz tool also helped to discover some tolerances in base R that needed to be adjusted.
This work also helped to make it clear that if R wants to provide more reasonable behavior of R math functions than no guarantees on any properties, it needs to implement such properties on its own.
This work has been possible thanks to an investment of the Sovereign Tech Fund.