G11N: The Gateway to Software Globalization

A Gentle Introduction to Source Code Internationalization, Localization, and the Purpose of a Translation Memory

Qasim Khawaja
8 min readNov 25, 2020
A painting of the world with the title “G11N: The Gateway to Software Globalization”

Table of Contents

· Table of Contents
· Introduction
· Demo
· What is g11n?
· Challenges in Creating a Global Software Product
Common Problems
· Creating Software for Multiple Locales
· Source Code Internationalization (i18n)
· Content Localization (l10n)
· Translation Memory (TM)
· Conclusion

Introduction

When I first started my internship with the Globalization team at Intuit, I had no idea how huge tech companies released software that worked across multiple regions in multiple languages. Over the past year, I have had the opportunity to learn more about how software globalization works. In this post, I provide an introduction to some important terms in the hopes of helping others in a similar place to me a year ago. Just for fun, I also decided to apply some of these concepts in my own personal portfolio project.

Demo

This is my personal portfolio site built on the Next.js/Vercel stack using locize for localization management. The idea was to create a multi-region multilingual portfolio site.

Here is a link to the site: qasimkhawaja.com

Demo: Grid of 3 different versions of the site. The versions from left to right: English Canada, English Pakistan, Urdu Pakistan

I have also attached the full source code to the project including documentation for anyone that’s interested in the implementation.

What is g11n?

Globalization (g11n) of software is a combination of internationalization, localization, and translation. Internationalization (i18n) includes designing software in a way that can support expansion across multiple locales. Localization (l10n) is the actual process of building a product to serve a particular locale or language. From the lens of software engineering, if localization is the implementation, then internationalization is its interface. The abbreviations of globalization (g11n), internationalization (i18n), and localization (l10n) follow a simple pattern where the digit in each term corresponds to the number of characters between the first and last character of the term. For example, localization has 10 letters between the “l” and the “n” so it becomes “l10n.”

Challenges in Creating a Global Software Product

Software that supports the region’s local language attracts a large number of native customers. This is because users feel more comfortable using something that they understand better.

An example of Airbnb’s interface in another language.

Companies like Intuit, Microsoft, Apple, and Google do not limit themselves to their original country and language. Instead, they innovate on optimizing the localization process and prioritize Globalization as foundational to their growth. This is because sales generated in other countries have the potential to eclipse the revenue of the product in its original locale. However, creating software that supports multiple locales is a challenging task. Here are just a few common problems that occur in this journey:

Common Problems

Uneven space is occupied by characters and screen orientation for different languages: A phrase in English may occupy more words than its equivalent in Japanese. Screen design needs to consider spacing requirements for different languages.

Data format differences: Date format in the United States is “mm/dd/yyyy”, while in many other countries, it is “dd/mm/yyyy.” Western numbering systems also deviate from eastern (such as in countries like India and Pakistan) from the 5th power of ten. Different countries have different currencies and measurement units. Some utilize daylight savings time, while others do not. Another common problem is address standardization as most code validations only adhere to U.S. states and zip codes. Address formats differ significantly, even amongst western countries.

Example: Western number systems typically use new definitions for every third power of ten, such as million (10⁶), billion (10⁹), trillion (10¹²), and so forth. Eastern systems in the subcontinent employ new definitions for every second power of ten (10^(5+2n)), such as lac (10³), crore (10⁵), arab (10⁷), and so forth.

Cultural differences: Phrases, symbols, and greetings may have different meanings in different cultures. Idiomatic copies do not translate well. Images, symbols, colours, and text are sensitive to locales.

Example: Many U.S. forms use ‘×’ to mark as correct (especially ballot papers). However, in many other countries, the ‘×’ mark is a concept of negation.

The checkmark ‘✓’ typically means affirmative in the English speaking world. However, in Finnish, it means wrong as it resembles a slanted ‘V’. In Japan and Korea, ‘o’ means correct, and ‘×’ or ‘✓’ means wrong.

Example: Confusion with a checkmark in Finnish

User experience: UI design needs to accommodate variations in spacing, density, and also consider languages that write from right to left.

Example: Screen design needs to render correctly for right-to-left oriented languages, such as Arabic.

Creating Software for Multiple Locales

A user will always feel more confident using the software in their native language and may prefer such a product over similar competitor products, with all other things being equal. Customers are more likely to buy products from websites or apps developed in their native language. Tailoring the product to the target market’s language is critical for the success of a global software product.

A locale defines a region where people speak a language or its variants. A locale is not only specific to a country, as many countries have multiple languages, and many regions have their own languages. Localization (l10n) includes adapting the original content to be relevant in the target locale.

A product that supports internationalization (i18n) is easily expandable in multiple geographies and across various demographics.

The best time to support i18n in the product is in its initial design phase. Redesigning software to suit multiple languages is a costly process that often requires budgets and resources equal to the original effort. It's also crucial to remember to not only translate the software’s active screens but also supporting material like help, manuals, API’s, and any other supporting material.

Source Code Internationalization (i18n)

Any language specifics should not be hardcoded in the source code. The ideal way to do this is to write utility methods to fetch locale-specific data from a data dictionary. Data dictionaries typically consist of key-value pairs specific to each locale. The value corresponding to the key is the actual text that displays on the screen. The dictionary is usually a set of files in JSON, YAML, PO, or one of the other standard formats.

A few tips for internationalization:

  1. Order the dictionary and avoid duplicate keys in it.
  2. Do not concatenate strings and include punctuation as part of the copy.
  3. Pass dynamic data to translation methods as it interjects well in the dictionary itself.
  4. Word orders and sentence structures are essential and need translation experts. It is best to store sentences and paragraphs in the dictionary to maintain its meaning.

Screen design requires considerable thought due to languages that support right-to-left writing and the variation in word count to convey the same meaning. You need to know the encoding of a string to interpret it or display it correctly. That's why it’s important to specify the character sets used for the pages.

Currency symbols, date formats, and measurement units also change with locale. Some languages or i18n libraries provide useful methods to solve this.

For example, display timestamps dynamically as they change with locales.

Content Localization (l10n)

Localization (l10n) is the actual adaption of the content to suit a particular locale. It may include writing code that will dynamically pick translated text and symbols of a particular locale. Localization often requires linguistic experts who have a deep understanding of the language and culture. A translation memory is helpful to store repeated data and ensure uniform translation by a set of experts.

Example: Idiomatic text and word to word translation rarely produce a correct translation.
Example: Localizing a string for the PK region.

Translation Memory (TM)

Example: A simplified flow of how a Translation Memory(TM) is used.

A translation memory (TM) is a database that stores previously translated sentences, paragraphs, and segments of text to aid human translators. A translation unit is an entry in the translation memory, which includes the original language segment as the source and its translation in the destination language as the target. A translation memory automatically suggests previously-stored identical or similar matches while translating new documents. It is an essential aspect of localization that improves the speed and quality of translation.

The benefits of using a translation memory are complete document translation, acceleration of the translation process, cost reduction, and consistency when employing several translators. Translation memory systems tend to be centralized server-based although there are some that are serverless as well. They use textual or linguistic parsing, segmentation, alignment, and term extraction to provide perfect or partial matches based on their employed algorithms.

A translation memory also requires a tool to provide automatic retrieval and substitution of text while a translator moves through the document. This software repeats the old translation for exact matches, which may be contextually wrong and need a manual translation by a reviewer.

Literary or creative text, which expects accurate translation of the text’s message and not just its component sentences, requires manual translation. Technical manuals or text with highly repetitive special vocabulary may be more successful when using translation memory.

Example: How to add keys in a TM (This is also the TMS that I used in my demo project.)

Conclusion

In an ever-expanding world, it is imperative for software products to engage users in their native languages. Software globalization (g11n) allows for designing and building software that both feels local and expands globally. Although it is challenging to develop software that runs in various locales, the result is worth the effort. For a business that aims to expand globally, and even regionally, it is necessary to solve the language puzzle. A robust globalization ecosystem is crucial to building localization efforts in a generic way to conquer the speed and scale frontiers.

--

--

Qasim Khawaja
Qasim Khawaja

Written by Qasim Khawaja

No responses yet