If you use R you may have wondered if there are ways you can contribute to making R better. An important feature of R that encourages its use around the world is the support for localization. This enables R’s messages, warnings and errors, as well as menu labels in the Windows and Mac OS GUIs, to be shown in the user’s local language.
Localization relies on translations that are contributed and maintained by volunteer translation teams. We recently ran a series of Collaboration Campfires, where we explored what motivates people to contribute translations, the current status of translations in R and how people can get involved. In this post we share the insights from those sessions.
Why get involved in translation?
An obvious reason for helping to translate R messages is that it makes R more accessible to non-English speakers, especially in communities where a working knowledge of English is uncommon. If you are a package developer, it is useful to learn about the translation infrastructure, because the same infrastructure can be used to add translations to your own package(s). Indeed, the same infrastructure is used by other open source projects. Contributing translations to the R project is a good starting point for learning about and contributing to the development of R more widely. As with any open source contribution, there is the benefit of building your knowledge and your network as you interact with other developers. It can also be a nice addition to your CV/resume!
Current status of translations in R
The translations of R messages are stored in PO files, a plain-text file format
for use with the GNU gettext software. The source code for each package in base
R has a
po directory which contains the PO files. There are up to three PO
files for each package: one for messages contained in the R code, one for
messages contained in the C code, and for the base package only, one for text
displayed in the Windows GUI.
Therefore we can explore the current status of R by extracting the PO files from the 14 packages in base R (some translation teams also provide translations for the Recommended packages, Mac OS GUI, and the Windows installer, which we don’t consider here). For each message, we can determine whether a translation is available for a particular language and if so, whether it is up-to-date or if the message has changed since it was translated, i.e. the translated message is “fuzzy”.
We can see that there are a few languages with near-complete, correct translations: French, Italian, Russian, and Lithuanian. Then there is a group of languages with slightly lower coverage and a higher proportion of fuzzy messages: German, Polish, Chinese (Traditional), Japanese, Chinese (Simplified) and Korean. There is a third group with only about a third of the messages translated: Norwegian (Nynorsk), Turkish, Danish and Spanish. Only the GUI messages have been translated into Persian. The standard English messages have been translated into British English for a few cases in the base and grDevices packages. Finally there is one message in standard English that adds information about the locale to the startup message in R.
The metadata in the PO files includes both the date that the English messages were last updated and the date the translation was last updated. The plot below represents the last translation date as a lag time in years from the last message update, for each PO file.
The last translation date was missing for some of the PO files, in particular, no dates were available for Italian. However, the plot shows a clear correspondence to the previous plot - the languages with higher translation coverage have been updated closer to when the English messages were last updated. The languages with poor coverage have not been updated for at least 5 years prior to the time the English messages were last updated. For Chinese (Simplified) most of the files have not been updated for at least 10 years.
For the languages with lower coverage, we can explore the choices translation teams have made regarding which messages to prioritize for translation. The plot below compares the coverage by package for the languages with lower coverage:
Norwegian and Spanish translations are only available for the base and graphics packages. Turkish translations cover a few more packages, including about half of the messages in the stats and stats4 packages. Brazilian Portuguese and Danish translations are available for all packages (there are no messages to translate in the datasets package), but for several packages the proportion of translated messages is very low (less than a quarter).
How can you help?
Clearly there is a lot of scope for people to contribute new translations of messages, or to update translations that are no longer correct. The first step is to learn more about how translations are added to R packages. We recommend starting with the Translating R to your Language tutorial from useR! 2021 - you can watch the video and/or read the slides.
Once you have a basic understanding of the process, find the contact person for the language(s) you can contribute to on the Translation Teams page. If your language is not there, or the team requires a new maintainer, post a message on the #core-translations channel of the community-run R Contributors Slack or on the R-Devel mailing list to offer your help.
The maintainer or help channels should be able to tell you how to contribute for a specific language. Some translation teams maintain translations on GitHub, e.g. Italian and French. The Hungarian team is trialing a Weblate server that allows translations to be contributed via a browser. Several contributors involved in translation are active on the #core-translations channel of the R Contributors Slack, so that is a good place to ask any questions on how to get started or deal with any issues you encounter as you start contributing.
There are number of works in progress that will provide additional support in future:
- R translations lesson: a self-paced online lesson based on the Collaboration Campfire activities.
- R translations dashboard: a Google Summer of Code project that will create a dashboard to monitor the status of translations in R.
- R Development Guide: a new chapter on contributing translations is planned as part of Google Season of Docs 2022.
Thanks to the participants of the Collaboration Campfire “Explore R’s Process for Localization (Translation)” that provided ideas and draft visualizations for this blog post: Shimelis Abebe Tegegn, Iman Al Hasani, Michael Blanks, Michael Chirico, Toby Dylan Hocking, Pawan Jangra, Ella Kaye, Piyush Kumar, Beatriz Milz, Kozo Nishida, Lucy Njoki Njuki, Riva Quiroga, Marcel Ramos, and Ben Ubah.