Alternative toolchains on Windows



R and R packages on Windows, if they include native code, are built using compiler toolchain and libraries from Rtools. There is always a specific version of Rtools for a given version of R. Rtools44 is used for R 4.4.x, and hence R 4.4.x and packages for that version of R are always built with GCC 13.3 for x86_64 systems and with LLVM 17 for aarch64 systems. This is the case at least since Rtools42 (R 4.2 was released in April, 2022).

Having a fixed toolchain is normally a benefit, but it is sometimes useful to be able to test the code of R and packages with other toolchains.

This blog post is about recent improvements in R-devel (to become R 4.5) which allow this without changing the code and allow the use of sanitizers. It may be of interest to maintainers of packages that include native code (C, Fortran or C++), and to people working the the source code of base R.

Benefits of a fixed toolchain

Having a fixed toolchain has certain benefits. When all users of a package use that package built via the same toolchain, it is very unlikely they would run into compilation problems not seen by the developers such as unsupported options, unsupported language features or incompatibility with system headers. It doesn’t matter whether users download package binaries say from CRAN or Bioconductor or build them from source: all of these use the same Rtools. To some extent, this also reduces the risks users would run into runtime bugs caused by compiler differences, even though this depends on the quality of testing.

Unlike Linux, there is no established software distribution of open-source software that could be expected to exist on Windows users’ machines. R itself and R packages hence link such needed external dependencies statically: the R executable, R shared libraries and shared libraries of R packages already include all the needed code, which avoids maintenance problems that would be otherwise imposed on the users, say if dynamic linking of external dependencies was used.

Particularly on Windows, it is not safe to link code compiled by different toolchains statically due to possible differences in runtime (may include C runtime, compiler runtime, C++ library, etc). All code linked statically should be compiled by exactly the same toolchain.

Having a fixed toolchain then makes this easy to achieve without rebuilding always everything from scratch: the toolchain is distributed together with static libraries compiled by that toolchain. R itself is built by the same toolchain. This works as long as packages only use either pre-compiled code from Rtools or code that is always compiled using the toolchain from Rtools at package installation time (i.e. source code embedded in the package). On CRAN this is a requirement.

Some problems may arise in a single address space even with dynamic linking, when different dynamically linked parts are built by different toolchains. These problems are less common, sometimes may be avoided by careful interface design, some have been avoided by switching to UCRT as the Windows C runtime (in R 4.2).

With R, dynamic linking is used primarily between R, R packages, and wrapper libraries provided by R, which are also built by Rtools. Indeed, some dynamic libraries linked are built by different toolchains: system libraries and in rare cases some external dependencies of R packages.

Rtools used for a given minor version of R, e.g. Rtools44, are usually updated several times and frozen some time before the next R minor release, e.g. R 4.5.0. This could mean that some dynamically linked parts are built by a slightly different version of Rtools44, e.g. R release built with a different version than an already installed package or previously built and not-yet-rebuilt binary version of the package. To minimize these risks, these small updates in Rtools do not change the toolchain: typically they use the same version of GCC or only increase a patch version, the same version of binutils and the mingw-w64 runtime.

Drawbacks of a fixed toolchain

Some of the benefits mentioned above are also drawbacks. A fixed environment may be good at hiding errors, but then, they wouldn’t be found by testing.

These errors then may show up when the compiler toolchain is actually updated in a new version of Rtools. Problems may also show up when a new architecture is added and uses a different toolchain, such as LLVM toolchain on aarch64.

These problems in practice are usually found in Windows-specific code and Windows-specific configuration (e.g. make files), because non-portabilities in code used also on Unix would be likely quickly found on Linux, where different distributions and installations use different toolchains.

Testing with alternative toolchains

As described above, it is key that all code in packages and in R is built by the same toolchain. So all of this only applies to testing, when one builds all of R and R packages completely from source. Binaries built by different/alternative toolchains should never be distributed to users. It is only recommended for experienced package authors to experiment with this testing and it requires knowledge of Rtools as well as Msys2.

Testing base R with alternative toolchain in the past required changes in R source code (make files). It was still done time to time, but with the current R-devel, one can use at least some alternative toolchains without modifying the code.

This is useful for better testing of R itself, because new compilers may introduce stricter checks that discover bugs in the code that could result potentially in incorrect results or crashes even with other (or older) compilers. So, this is useful even though for Windows, we use a fixed compiler, which is known before the corresponding release of R.

There are additional benefits of testing with additional toolchains described below.

R-devel now by default uses pkg-config for figuring out which external libraries to link and which C preprocessor directives to use for those libraries from which library/include directories. R 4.4 and earlier used a fixed setup tuned for the corresponding version of Rtools.

pkg-config is supported both by Rtools and by toolchains in Msys2. This made it easier to encapsulate the differences between toolchains and compiled libraries, even though it did not completely erase the need for tweaks to support Msys2 toolchains. Not all Msys2 toolchains would work at the moment with R. Not all are necessary, but the interesting ones for testing are “ucrt64” and “clang64”.

One can use the “ucrt64” toolchain from Msys2 to test a recent GCC on x86_64. Especially around the end of the year, Msys2 will have a newer version of GCC than Rtools issued the year before. At this time, it is 14.2, which could easily be the toolchain version for the next Rtools. There is no special configuration needed. R make files with find the compiler toolchain on PATH. So, package authors who would want to prepare their packages a bit even before Rtools45 is released, can do it using “ucrt64”.

The “clangarm64” Msys2 toolchain on aarch64 would serve the same purpose. Package authors can test their packages with a newer version of LLVM than Rtools has, around this time of the year. However, this requires access to aarch64 hardware.

The “clang64” toolchain from Msys2 can be used to test R and packages with the LLVM compiler toolchain, including the flang-new Fortran compiler. This allows to test compatibility of a package with LLVM for those who do not have access to a Windows/aarch64. Many of the issues found in R packages on Windows/aarch64 are in fact because of using non-portable GCC features or options that are not supported by LLVM. Only one bit of special configuration is needed, set USE_LLVM = 1 in MkRules.local, as is done when building on Windows/aarch64.

To be able to test R package with these alternative toolchains, it is necessary to use only static libraries already available on the system of the user (so, in real use static libraries only from Rtools, but in these tests, they will be from the corresponding toolchain from Msys2). One also needs to use pkg-config so that the proper library dependencies and C preprocessor directives apply to the toolchain.

One needs to specify all pkg-config packages directly used by the R package when querying pkg-config for libraries and flags. This may not always be the case normally when Rtools is used, because Rtools uses static linking, so pkg-config with Rtools would include to the link list all static dependencies of the libraries, some of which may be also directly used by the R package, but may belong to other pkg-config packages.

Msys2 uses dynamic linking. Dynamic libraries remember their dependencies on their own, so pkg-config will not include those in the list of libraries to link. Building with an alternative Msys2 toolchain is hence a good test of whether the specification is complete.

Now, it should be said that even with Rtools, packages get tested by multiple toolchains when they are tested with multiple versions of R. CRAN currently tests R-devel, the current release and the previous release. Also, a new release of Rtools is usually available some time before an R release, so that packages using R-devel may be tested with it.

Testing with sanitizers

Sanitizers are useful tools for finding different bugs in programs, including memory access/allocation errors and relying on undefined behavior. In some cases, sanitizers have discovered R memory protection (“PROTECT”) errors.

Unfortunately, GCC currently does not support sanitizers on Windows. LLVM supports sanitizers on x86_64, but not on aarch64. Rtools hence doesn’t support sanitizers on any platform.

With the Msys2 “clang64” toolchain, one may now use sanitizers by setting SANOPTS in MkRules.local to a suitable value, such as -fsanitize=address,undefined,bounds, which is then passed to the compiler and linker where needed during compilation of R itself and then of packages.

The new sanitizer support has been used with base R and the discovered issues, where possible, were fixed. These issues included code that is only used on Windows, but also common code where differences between Unix and Windows mattered (e.g. size of integer, long vs size of pointer).

Testing the sanitizer support with packages only extended to packages distributed with base R. Problems may appear when checking other packages.