Yugoslavia: Sun’s Java Localisation Fiasco
2008-11-30
In the 1990s and the first decade of the 21st century, the European country of Yugoslavia violently broke up into a number of now-separate new countries. The Socialist Federal Republic of Yugoslavia (SFRJ) had been home to more than 23 million people living in six republics hosting a multitude of different ethnic groups, speaking a couple of languages and using both Latin and Cyrillic writing systems. During the long and painful split-up, the situation has become even more complex: what used to be considered mere regional variants of the same language, Serbo-Croatian, has now become a set of at least four languages that are claimed to be separate: Serbian, Croatian, Bosnian and Montenegrin. Serbian is written in either Cyrillic or Latin and chiefly spoken in Serbia, Bosnia and Montenegro. At least, that is: Serbs living in Croatia probably consider their mother tongue Serbian as well, and in Montenegro, the question of who’s speaking Serbian or Montenegrin and what the differences are doesn’t seem to be sorted out yet. In Bosnia, the three main ethnic groups are Bosniaks, Serbs, and Croats, supposedly speaking Bosnian, Serbian, and Croatian, respectively. The original SFRJ was first reduced to Federal Republic of Yugoslavia, which was later renamed to “Serbia and Montenegro”, then it split into the separate countries of Serbia and Montenegro. Macedonia is struggling with Greece over the use of its constitutional name, resulting in the clumsy moniker “Former Yugoslav Republic of Macedonia”. Who knows what’s next?
Yugoslavia is a a complex task for anyone involved in localizing software. Languages have come and gone, countries have come and gone or simply changed names. But with some good will and competence, it’s a situation that can be handled gracefully. Microsoft, of all companies, has apparently managed to do so in its software products. Sun’s behaviour with its flagship Java language and runtime environment, on the other hand, has consistently appeared both stupid and arrogant – the worst possible combination.
What’s the problem, anyway?
Right from the beginning, Java was an attractive platform for developing globalised software: full Unicode support throughout and a standard runtime library that provided assistance for localising applications. This included support for various writing systems, languages, calendars, local date formats, and so on. However, it wasn’t quite perfect.
To begin with, Sun of course couldn’t ship full localisation info for every locale in the world, in fact they only do so for very few of them. In May 1997, more than eleven years ago at the time of this writing, someone realised that the problem wasn’t so much that other locale settings weren’t provided out of the box. The problem was that you could not easily add additional ones yourself: you only ever get whatever Sun bundles with the JRE. Sure, you can write your own locale-specific versions of DateFormatSymbols and whatever else you need. But then the elegant automatic mechanism of just instantiating a Locale object and getting the desired behaviour from all Locale-aware classes still does not work. You would have to feed each of your localised classes manually into whatever code needs them. Ugly. Painful. Stupid. And if you use a third-party library that doesn’t provide an interface for that, you’re out of luck. But for Sun, this was only considered a problem of “Priority: 4-Low”.
In 2003, a geopolitically interested person noticed that the country named Yugoslavia didn’t exist anymore, demanding that the corresponding locales should be deleted. This time, Sun reacted promptly and duly purged all traces of the past from the Java 1.5 VM. The committer sensibly promised to redirect all references of country code YU to CS (Serbia and Montenegro) and language code “sh” (Serbo-Croatian) to “sr” (Serbian). Good idea! Unfortunately, he or she was either fired or promoted to management before actually doing so. The result? No more support for Serbian and Montenegrin Java users, some of whom were now angrily demanding that this should better get fixed quick. After all, you don’t expect functionality to just disappear without warning from one Java version to the next, neither in the official API nor in the localization database. Sun, of course, chose to think of it as a non-problem and ignored them.
In the same year, a Slovenian user complained that some of the definitions (e.g. date formats) for his locale were simply wrong. Sun’s response came within a month, promising a fix for the issues – but it wouldn’t come soon, and it would only affect the date format. The bug reporter came back again, clearly stating other issues, providing references and pointing out that some problems also affected the locales for Croatian, Serbian, and Macedonian users. And, as others had done before, he mentioned that Microsoft had gotten these things right. Why not Sun? Alas, no response.
Two years later, in early 2005, the time was ripe for another bug report. Where are the Serbian and Montenegrin locales? Sun assured the bug reporter that everything was perfectly fine, apparently suggesting that the two republics had somehow been wiped off the map – like a micronesian island state drowned by rising sea levels. Bug closed! But after a string of protests, and after some users had taken the matter into their own hands , eventually Sun got the message: almost exactly one year after the original bug report, it was finally re-opened as a “request for enhancement”, promising integration in 1.6 and even a backport to 1.5.
Even in 2008, there is no support whatsoever for a Bosnian locale. For many purposes, it would be quite OK to use the Croatian locale instead. There are at least two problems with this, though: first, the Croatian locales use a highly unusual date format, apparently because that format was once defined as a national standard that was never actually adopted by people outside of the standards institutes. Second, Croatian has different names for the months than Bosnian and Serbian – not just slightly, but completely different. On the other hand, the Serbian locales are fairly well-defined and would be a usable option as well. Unfortunately, they are only defined in Cyrillic.
And Today?
With the release of Java 1.6, finally that decade-old feature request has been addressed by introducing Locale Service Provider classes. This way, locale data not provided by Sun can be easily and transparently added by third parties. While this is a big relief for some, it’s not a solution for those dealing with built-in locale data that are simply wrong: the bundled locale-specific classes are always searched first. For an example, check my Java Locale Service Provider for Bosnian/Croatian/Serbian which attempts to fix some of the problems described here.
When will Sun get the message and finally do something about this sorry state of affairs? Such as, listening to their users and actively improving localisation quality? I’m hoping for OpenJDK, but I’m not holding my breath.
Some Tests on a Recent JRE
Tests run on the most recent officially released Sun JRE at the time of this writing: 1.6.0_10 on Windows XP.
bs (Bosnian)
missing – unsupported
hr (Croatian)
| First day of week | 2 | |
| Date (long) | 2008. studeni 29 | probably wrong, or at least very unusual |
| Date (medium) | 2008.11.29 | probably wrong, or at least very unusual |
| Date (short) | 2008.11.29 | probably wrong, or at least very unusual |
| Month name | studeni |
hr HR (Croatian – Croatia)
| First day of week | 2 | |
| Date (long) | 2008. studeni 29 | probably wrong, or at least very unusual |
| Date (medium) | 2008.11.29 | probably wrong, or at least very unusual |
| Date (short) | 2008.11.29 | probably wrong, or at least very unusual |
| Month name | studeni | |
| Currency: | HRK | |
| Currency amount: | Kn 9.999.999,9 |
mk (Macedonian)
| First day of week | 1 | probably wrong |
| Date (long) | 29, ноември 2008 | probably wrong, or at least very unusual |
| Date (medium) | 29.11.2008 | |
| Date (short) | 29.11.08 | |
| Month name | ноември |
mk MK (Macedonian – Macedonia)
| First day of week | 1 | probably wrong |
| Date (long) | 29, ноември 2008 | probably wrong, or at least very unusual |
| Date (medium) | 29.11.2008 | |
| Date (short) | 29.11.08 | |
| Month name | ноември | |
| Currency: | MKD | |
| Currency amount: | Den 9.999.999,9 |
sl (Slovenian)
| First day of week | 1 | wrong |
| Date (long) | Sobota, 29 november 2008 | |
| Date (medium) | 29.11.2008 | |
| Date (short) | 29.11.08 | |
| Month name | november |
sl SI (Slovenian – Slovenia)
| First day of week | 1 | wrong |
| Date (long) | Sobota, 29 november 2008 | |
| Date (medium) | 29.11.2008 | |
| Date (short) | 29.11.08 | |
| Month name | november | |
| Currency: | EUR | |
| Currency amount: | € 9.999.999,9 |
sr (Serbian)
| First day of week | 2 | |
| Date (long) | 29.11.2008. | not really a long format |
| Date (medium) | 29.11.2008. | |
| Date (short) | 29.11.08. | |
| Month name | новембар | correct, but Latin variant missing |
sr BA (Serbian – Bosnia and Herzegovina)
| First day of week | 2 | |
| Date (long) | 29. новембар 2008. | correct, but Latin variant missing |
| Date (medium) | 2008-11-29 | apparently fallback to ISO, dd.mm.yyyy. is usual format |
| Date (short) | 08-11-29 | unusual? |
| Month name | новембар | |
| Currency: | BAM | |
| Currency amount: | КМ. 9.999.999,90 | probably correct, but Latin variant missing |
sr CS (Serbian – Serbia and Montenegro)
| First day of week | 2 | |
| Date (long) | 29.11.2008. | not really a long format |
| Date (medium) | 29.11.2008. | |
| Date (short) | 29.11.08. | |
| Month name | новембар | correct, but Latin variant missing |
| Currency: | CSD | |
| Currency amount: | CSD 9.999.999,90 |
sr ME (Serbian – Montenegro)
| First day of week | 2 | |
| Date (long) | 29.11.2008. | not really a long format |
| Date (medium) | 29.11.2008. | |
| Date (short) | 29.11.08. | |
| Month name | новембар | correct, but Latin variant missing |
| Currency: | EUR | |
| Currency amount: | € 9.999.999,90 |
sr RS (Serbian – Serbia)
| First day of week | 2 | |
| Date (long) | 29.11.2008. | not really a long format |
| Date (medium) | 29.11.2008. | |
| Date (short) | 29.11.08. | |
| Month name | новембар | correct, but Latin variant missing |
| Currency: | RSD | |
| Currency amount: | RSD 9.999.999,90 |
