Saturday, July 02, 2005

Localization, now!

Localization, what an important domain of commercial project development it is. Unfortunately its often forgoten in the OpenSource world. This seems reasonable, since the OpenSource software does not aim to conquer national markets, vendors hardly ever do local marketing. It is important to distinguish localization from translation. The latter takes care of translating the user interface and documentation to the desired language, while localization is responsible for making sure that programs have the same functionality in different languages.

How important is localization? Imagine buying a ferrari only to learn that you can not speed greater than 60km/h with it outside Italy...



Websites providing content


This is about applications that use external websites to serve additional content. Two examples, both from amarok.

The information tab that uses Wikipedia to display information about the artist of currently played song. The information is searched in English Wikipedia, even if running with a Polish locale, amarok still chooses to search the English Wikipedia by default. Most of Polish artists are not available there. The user is informed that no page was found. While it is very probable that such a page exists in Polish Wikipedia (Pudelsi is an example band name). Absolute l10n no-go.


A much bigger failure in localization is the lyrics tab. Searching for most of Polish lyrics there is just a mistake. This desperately needs localization. Even the encoding of the artist name is broken by the search engine used on the Lyrics tab backend. This is of course much harder to localize, since this process would need coding skills. Still it remains a huge bug, I doubt a commercial vendor would release this feature on a non-English market with its current state.



Amarok is the first application that is using website providing contents to such extent and therefore it is only natural that the amarok team made those mistakes. I love amarok and I am full of respect for what the amarok folks wrote, please treat this article as a hint and not as a depreciation of amarok.




Speech synthesis


The speech synthesis via festival localization was beautifully done by the kttsd developers. One Polish voice (male) is present on the voices list. It seems there is one more Polish voice (female) for mbrola, but I can not check if it works with festival. Still kttsd has two issues. First when I checked the voices file, there was no non-European/USA voice listed. I could bet there were Arabic voices for festival. Quite an localization issue in a big part of the world. Second one is no predefined command plug-in synthesizers. In Polish there is one free synthesizer for linux - powiedz, command line tool. It could be supported by default too.



Translation service


Half of KDE applications use the famous babelfish and google translation services. From konqueror's addons, through kopete's translator plug-in to kbabel. For years no one noticed that translation services exist even for languages different than the several ones supported by those two translation engines. An example is: translantica - English<->Polish translation engine created by the academic workers from Polish Academy of Sciences. I am sure there exist similar languages for the Arabic languages. Even if there are no Spanish<->Polish or Japanese<->Polish services, one can still try doing Spanish<->English via babelfish or google and English<->Polish via translantica. It can be done for any language with bidirectional English translation engine. No localization whatsoever here.



Search providers


I have set the polish locale but I am using the English language. Now I want to google in Konqueror. Guess which google version does Konqueror load when using a search provider? Yes, the English one. It is impossible to describe how annoying this situation is. But wait there are more services like that (ex. wikipedia). Ofcourse this flaw is only partial, since with KDE with Polish translation should at least choose the Query[pl] URL over the Query one. Still what about the polish search providers? I have made collection of them for the PLD Linux I am working on and I will commit them once my todo for them is complete. Still I am completely astonished by the fact that no one else from the community did not submit such data. Two localization issue here then.



Thesauruses and dictionaries


The last localization aspect but how much underrated in the KDE community now. How many koffice thesauruses can you count? I have yet to see an non-European thesaurus for koffice. But the same goes for dictionaries, looking at the quality of Microsoft's dictionaries in the latest Office Suite, one will notice that there is no Polish dictionary with the support for punctuation and other advanced language rules around, even in commercial OpenOffice suites. But there is an open thesaurus project in Poland which provides koffice the thesaurus also in koffice format. Still what about non-European languages?



Conclusions


The situation is not perfect and while most Western European languages are not lacking good localization, the need for localization in the other languages is vast. There are two things to be done by two sides of KDE.

Developers


The developers need to create the localization possibility. Much of it already exists like thesauruses and kspell framework. But still there are things missing: a possibility to add translation engines and specify local equivalents of content-providing websites. One could dream about providing those possibilities without requiring the localization team to have programming skills.


Users


Well it is up to users to generate localization teams, developers cannot know about every aspect of localizations. Hey, go for it!


2 comments:

Unknown said...

wrt Speech Synthesis.

KTTS currently supports the following languages, provided you have the appropriate voices/synthesizer installed:

English
Finnish
French Canadian
Spanish
Mexican Spanish
German
Hungarian
Italian
Polish
Russian
Slovak
Swahili
Zulu

Thing is, the Festival Interactive plugin will not show you the voices unless they are installed. (What's the point in showing a choice you can't use?)

These are all the non-commercial voices and synthesizers I've been able to find that actually work under Linux. (Some of the languages listed above rely on commerical Cepstral voices running under non-commercial Festival synth.)

I'm always open to additional contributions. Just send me the details where I can find the voices and/or synth.

The KTTS Handbook and website give information on installing the voices and synths above.

The "pre-configured" command plugins are a good idea.

The sad thing about the availability of other languages is there's a ton of them listed at the MBROLA website, but no non-commercial software for doing the text to phoneme conversion.

I blogged about this with respect to a French language a few weeks ago.

Unknown said...

The local communities (say, KDE Netherlands) play an important role in l10n. That's the reason why we stimulate these communities.

About the search providers, the current implementation is a mess, IMO. For example, the Dutch team has added some local pages to the search providers, it's not any useful for the rest of the world. I did some effort to change this, but haven't committed so far.