A minimalistic approach to transparently translating your application

Translating an application is not trivial but it shouldn't be complicated. Using existing tools, we can flag the text that needs to be translated and extract it into resource files. Such files will be populated, either by human translators or machine generated content, and then the translation software will transparently do the rest. ¡Vamos alla!

Representation of multi language platform

There is a set of features we will generally be interested in when setting up a translation system. Firstly we want the application code to remain agnostic about the translation process. Ideally we want to write the code in the default language (e.g. english) and let the translations be resolved during runtime, depending on the user selected language.

Secondly we will want to minimize the overhead of extracting the translations into resource files. If we have a system in place to parse the code, detect translatable text and extract it into the resource files automatically, we can focus on writing code without having to manually maintain JSON files.

Finally we will want the translation to be dynamic. If we are translating a client app there is no reason for the translation to require a page reload. We should be able to translate the static texts dynamically, without altering the state of the application. The only thing we need are the corresponding resources, regardless where they come from (e.g. always bundled with the static assets or lazy loaded from the server).

Having the requirements clear, let's first writing some code to translate a sample client app. The code will be simple and not suitable for a real life scenario, but it will help us understand which challenges need to be addressed and why it is worth using existing tools. Let's start by introducing the sample client app.

It is a trivial client app, but it has three representative types of text that need to be translated: function string parameters (e.g. the alert argument), static text (e.g. the button name) and text with dynamic values (e.g. Hello {name}). It is written in React but the same ideas can be applied to other frameworks (e.g. Angular).

In order to translate the app we will first need to modify the UI to include a language selector and store the selected language as part of the application state. Storing it via useState will take care about updating the application text every time the selected language changes. Having done that we can start with the actual translation part 💪

Screenshot of the sample app including a language selector

Custom code: translate

A simple and effective approach consists in passing all the text that needs to be translated to a translation function, which receives the user selected language as the second argument. The translation function then uses the selected language and the translatable text to retrieve the corresponding translation from a resources object.

We could use explicit keys instead of the english translatable text. For example: {translate('AREA_COMPONENT-DISCRIMINATOR', language)}. That would give us more control over the management of translations, but it decreases the code readability, as the actual content will only be present in the resource files. Have a look at this alternative if you are interested.

The cornerstone of this approach is therefore building the resources object. We want to avoid having to do that manually so let's write a simple parser to do it for us. It must take a set of code files, locate all the calls to the translate function, obtain each first argument and add them to a dictionary. We will use the actual english text as the dictionary keys. Here is a minimal TypeScript implementation of such parser:

A reliable parser must support many other scenarios, but this naive implementation (only allows single/double quote strings, with no escaped single/double quotes) is good enough for demonstration purposes. Running it (i.e. npx ts-node update-translations.ts) will generate the following JSON resources file, with empty values for all languages other than English.

Note that the translation function returns the english text by default; we don't necessarily need to include the English text into the resources dictionary. Here I'm doing it for consistency.

We could call an external API to provide machine-generated translations but I'm not going to address that in this article. At this point, and given we only have a few translations, we can populate the file manually. Once that is done the translate function has all it needs to transparently replace the english text with the corresponding selected language. Sweet! We have a first working version.

In some cases we might want to index the resources dictionaries with specific keys (e.g. using fixed length alphanumeric codes or UUIDs). It is not complicated to achieve; both the translate function and the parser will need to call an additional function to obtain the text key. Have a look at this alternative if you are interested.

Custom code: interpolation

You might have noticed that the string we are passing to the alert function requires two separate translate calls. While it does the trick, it also breaks the resource files consistency. This happens because the position of the dynamic value within the message might be different for each language. To get around that we need to deliberately set some resources to be white space:

It also means including trailing whitespaces in the translations, which can easily be lost by translation tools. To resolve this problem most translation libraries allow using templates/placeholders in the translatable text. This concept is known as interpolation and, once we have chosen a syntax to flag templates in the translatable text (e.g. 'Text [[template]]'), it is relatively simple to implement.

One way to do it is by adding a third parameter to the translate function so it accepts a map of values. The map is expected to have a property for each template in the string parameter, and we will use the value of each property to replace the corresponding template in the string. Sounds more complicated than it actually is:

Now Hi! This is [[name]]'s laptop becomes "Hi! This is World's laptop" (or, in Spanish, Ey! Soy el ordenador de World). This is starting to look solid 👌

Custom code: declarative component

One last thing before moving on to using existing libraries. With the addition of the translate function, the perfectly readable multiline text has become a long single line string. It would me more comfortable to have a Translate component that allows for multiline text and implicitly calls the function with its children as first parameter.

Much nicer. This simple change has consequences however. On one hand, the HTML/JSX code can contain line breaks and multiple white spaces that we will not want to include in the resource files. We will need to remove such characters in the parser. And, additionally, because the resource files are indexed using the english content, we will also need to remove those characters in the Translate component (note the parseHtml call).

On the other hand, we will need to modify the parser to extract the text inside the Translate components, as well as the strings passed to the translate function. Here things get complicated as HTML/JSX cannot really be parsed using regular expressions. We will modify our parser so it covers the sample app (using a multiline regular expression that matches text which doesn't contain the > character) but a solid approach requires building the code abstract syntax tree.

Now that we understand the complexity of maintaining a custom solution, we are in a better position to chose from the existing libraries out there. We want a library that provides an explicit translation function, interpolation, automatic parsing of the code files and, ideally, a declarative component. Well, it's our lucky day 🍾

Standard code: react-i18next

Why reinventing the wheel when there is an open-source, well tested and working library out there? It supports interpolation, a declarative component and a proper parser (e.g. it will not extract text that is commented out) to extract the translatable text from the code files. Say hi to your new ally: i18next.

Even though the documentation can be tricky, setting up i18next is simple. We just need to install the package, along with the React flavour dependency (react-i18next), and initialize it with the default language and the resources in the following fashion.

npm install --save i18next react-i18next

Next, just like we did before for our custom translation code, we need to add a language selector to the UI. This time we don't need to store the selected language as part of the application state though, as i18next will take care of that:

We are now ready to start using the i18next translation capabilities. In function components, such as our sample app, we can obtain the translation function by using the useTranslation hook. We need to use the hook for React to re-render the component when the selected language changes. The declarative component, Trans, is available out of the box. Putting it all together:

Only one thing left to do: automatically extracting translatable text into resource files. We will need a parser for that. Here I'll be using i18next-parser. It requires a fair bit of configuration, but it supports Typescript without additional steps. By trial and error I got it working with the following parameters. Start here and tweak it to meet your needs:

npm install --save-dev i18next-parser

Once you get the configuration right, running it will produce a resources file per each language, which we will need to import from the app. Note that i18next uses namespaces, expecting the resources for each language to be nested under a translation property by default. To workaround that when not using namespaces we can explicitly generate an object that matches the i18next expectations:

npx i18next --config ./i18next-parser.config.js

According to the documentation it is possible to disable namespaces. I spent a good while trying to do so but I couldn't manage to.

Bonus: machine generated content

As the application grows it won't be feasible to populate the resource files manually. Here is where a tool to manage the resource files can come in handy. There must be a bunch of tools for this purpose; I confess I haven't done any research on that. Here I'm suggesting BabelEdit, the tool we use at my current company. It requires purchasing a license but it does a good job and it comes with several useful features:

Generating machine based translations.
Filtering text that is empty for at least one language, or text that is equal to the default language.
Managing the approval status of each text and language. This can be used to keep reflect which text has already been reviewed by human translators.
Exporting the translations to a single CSV file containing all (or a subset of) the languages. CSV is the format that translators usually expect.
Importing a CSV file containing the translations for several languages into the corresponding JSON files. When the translators send the CSV file back, this will update the JSON files automatically.

To start using it we need to create a new project (Generic JSON will do), import the JSON resource files and set the primary language. Babel saves the project in an XML file, which we will want to commit to the repository. From now on populating the resource files will be a matter of a few clicks!

Setting the primary language of a BabelEdit project

Resources view in BabelEdit (pre-translated)

Conclusions

The main challenge when it comes to translating an app is detecting the text that needs to be translated and extracting it to resource files. Using a parser to automatically extract such texts makes the translation almost transparent from the developers point of view. And, if the parser is included in the continuous integration pipeline, it will make sure the resource files never go out of date.

HTML/JSX parsers are not easy to implement and maintain however, so it's worth using existing open source libraries. Do your own research and pick the tools that better suit your needs. If you are happy to take my recommendation and go with react-i18next, here is the sample code repository for you to start fiddling.

Posts timeline

⬅️ Previous

Gradual replacement of legacy web APIs using 404 responses and http-proxy

Following ➡️

Supercharged JIRA reporting using Google Sheets