Things to Consider for Sinhalese Translation

Sinhala is the main language of communication for Sri Lankans. Sri Lanka is home to Sinhalese, Tamils and Moors, but the nation uses Sinhala as its business language. Sinhala is one of two official languages in Sri Lanka, the other being Tamil. The population of Sri Lanka in 2012 was estimated at 20,271,464; of that number, 74% are Sinhalese and rest use Tamil for educational purposes. Sri Lanka’s literacy rate is 91.2%, but those who know English are far fewer in number.

This globally-active nation has been selected by many leading enterprises around the globe as one of their target markets, so companies tend to internationalize software, operating systems, web content, books, manuals, and administrative materials for Sinhalese speakers. Let us review some common issues companies encounter at a very early stage of Sinhalese translation as well as in the very last stages of validation for website internationalization projects.

Sinhala is a complex language with three different alphabets. These are the Pure Sinhala alphabet (32 letters), the Mixed Sinhala alphabet (54 letters), and the Modern Sinhala alphabet (60 letters). As you can imagine, the QWERTY keyboard is not sufficient to facilitate the number of characters in the Modern Sinhala alphabet, so electronic input accessory devices are developed to manage the Sinhala keyboard, albeit not easily. The set of vowel modifiers associated with Sinhala letters make matters worse.

Fortunately, however, Sinhalese does not have capitalization, so the space for the upper case key on the QWERTY keyboard can be used for additional characters. It is difficult to imagine how mobile devices manage this, as the QWERTY keyboard is barely managing. There are a few input method editors available; some are developed internationally while others locally. In addition, some are compatible with Unicode fonts, while others not.

Most magazines, books, booklets and promotional materials request standard fonts. Many standard fonts have been created by various publishers and they allow desktop publishers to make their layouts more beautiful. However, they are not very compatible with web technologies, so these layouts are published in image format on web sites. They are therefore not compatible with search engine and text prediction technologies.

Some input method editors that are compatible with standard fonts do not have keys assigned for punctuation marks (all three end stops) and some vowel modifiers. Due to this, file type conversions often cause font corruptions. File sharing also generates problems such as font corruptions and space/settings corruptions such that the recipient often does not receive what is sent to him. If the recipient has no knowledge of correcting these corruptions, the condition of the file is worsened further upon repeated transfer.

Rather than criticizing these editors, it is worth looking at how Unicode font editors solved this problem. Unlike standard font keyboards, letter characters were not assigned to symbol keys at the top of the QWERTY keyboard. Instead, Unicode introduced a separate set of letters that can be activated by pressing and holding the ALT key and then pressing the corresponding letter key. To ensure the quality of translated documents, I used to type in Unicode font first and then convert it into standard font using web-based conversion engines.

This process is very effective but does have few issues still. So you need to re-check the text after conversion. Most Unicode editors generate inherited faults while typing. One such fault is the failure to use vowel modifiers. However, this is only a problem with Standard Wijesekare keyboard editors. This can generally be avoided by refreshing the desktop once (i.e., minimizing all working windows and then refreshing the desktop). In some instances you may find that your editor generates very poor modifications when you are on translation workbenches.

To avoid this, you must close out of the workbench and restart it with the “Run as Administrator” command. If this is not a workable solution for your problem, then you’re better off trying with a Romanized keyboard. These are often called Singlish (Sin(hala)+(En)glish) keyboards. One popular editor has word prediction capability to save time, but it will not generate much of an advantage as Romanized editors take at least two and a half times the typing time as Wijesekare keyboards.

Another notable drawback with Unicode font editors is that they generate “unsolvable?” corruption issues with document mapping. If a project manager assigns a project without knowing this, he or she is likely to get trapped when performing the layout. If this is a massive issue with your document and you don’t have “Find and Replace” editing capacity in the particular desktop publishing software you are using, all the content may need to be retyped, which is a problem.

It is unfortunately common to see web sites containing corrupted content, even those owned by the Sri Lankan government. As I’ve mentioned, three kinds of font corruptions can be seen: either vowel modifiers aren’t used, certain classes of letters are corrupted (letters modified with rakaranshaya and yanshaya are the most common issue) or corruption occurs due to an unsupported editor. The first two can be fixed, but not the last.

Finally, “last-minute font corruptions” can be seen in software localization cycles, and I think those are incurred due to file type conversions. That is really an issue of resource consumption to be overcome. In addition, it is important to know that most (or probably all) Unicode fonts have glyphic errors, so people tend to seek standard fonts. A number of beautiful fonts are available.

In short, it’s better to know what technical obstacles you may face before launching a website localization project, in Sinhalese or any other language.

About Dewage Kumara

I am a native Sinhalese translator living in Sri Lanka. I work in the areas of website localization, software localization, translation terminology management, and website internationalization.