Speaking My Language

Download ogg Download mp3

We often celebrate multiple languages as one of the many outputs of the Open Source methodology. Jono Bacon and Stuart ‘Aq’ Langridge explore how the translations community works and ask how effective it currently is, and how effective it could be.

Of course, we are the very start of the conversation! What do you think? Do you think the translations community could be improved? Do you think we should care about translations? Have you got any experience or stories that you want to share about translating software? Do you think the commercial approach to translations is better or worse?

29 Comments to “Speaking My Language”

  1. James Duncan 26 May 2010 at 11:13 am #

    I’ve only needed to use one translation, but Ubuntu is translated into Georgian and is excellently translated.

  2. skkeeper 26 May 2010 at 11:34 am #

    I’m portuguese and I’ve used Rosetta to help translate some strings a while ago. The problem I faced is exactly the one you discuss on this episode. Because Portuguese is such a disconnected language from technical software terms I found it really hard and eventually stop doing it.

    On the quality side of crowd sourcing there is always a initial lack of quality, but thats just how FOSS works. When people jump in, the quality improves over time.

    Namaste & keep Shot of Jaq Alive : D

    • sil 26 May 2010 at 11:59 am #

      That’s really interesting. So the problem isn’t really the technology, it’s just that it’s hard to translate technical terms into Portuguese? Is this problem fixable in any way? I suppose that it doesn’t matter whether you’re using a crowdsourced model or not if there are no translations…

  3. Seetee 26 May 2010 at 11:41 am #

    File, Edit, View, etc… Are there any standardised translation of these? Any guidelines for the larger projects (Mozilla, OpenOffice.org, Kubuntu, etc…)? In Swedish the menu option “File” mostly gets translated into “Arkiv”. If you translate that back to English, you would get the word “Archive”, interestingly enough. Close, but not exactly the same, but traditionally it has always been translated that way. Swedish translations are often 75-90% complete. But in many applications it is not always that easy to change language, and a mixed environment is more confusing than a strictly original language environment.

    • Till Eulenspiegel 26 May 2010 at 1:01 pm #

      Since most of those English strings were likely originally borrowed from Windows or Microsoft Office or MacOS, it’s probably safe to steal Microsoft’s or Apple’s translations of those terms, right? They’re de facto standard.

      I’m writing a game and I speak decent German, though I’m nowhere near fluent. So I’m borrowing a lot of unfamiliar terms from the translations of Guild Wars and World of Warcraft, rather than muddling around with dictionaries or Wikipedia.

  4. bittin 26 May 2010 at 2:51 pm #

    i sometimes translates Ubuntu from English to Swedish via Launchpad, when iam bored, just done about 220 strings tough :p

  5. 50% translated is shit – like the half translation of Ubuntu in Esperanto (http://writtenandread.net/files/xubuntu-esperanto.ogg). Half translated is considerably less useful than not translated at all. I did the Danish Abiword translation, and that took a long time – ~1600 text strings, where a text string can be a word or a dialog phrase of some length. As an interesting example of the challenges in translation, there are two translations of ‘columns’, where one is columns in a table (kolonner) and the other is columns on a page (spalter), where the last term is a traditional newspaper printing term. And it is not easy to find out, even when looking in the surrounding code terms! Having said that: The Danish translation project (dansk-gruppen.dk) has a list of commonly occurring terms and an active mailing list for discussing terminology and coordinating translations. It was through this that Xfce was translated over a very short time with a very concentrated effort. What they will sometimes come across, and this is a major challenge to software translation: There are terms where the English term is the one used by professionals, but where some translators – and this, of course, is why they are drawn to it – think that it is completely unacceptable that there is no native term for it, or that the constructed Danish term for it is much better – even if practically noone uses it. So there are philosophical differences to the translation project, and this makes it less uniform than one would like. I am a linguist by education. If I was hired to do the translation, the quality would not be better than it is now. There are, however, some people with very good intentions but less impressive abilities, and they would probably be filtered out if it was a professional translator doing the job. Still, software translation is a very particular terminology, and a linguist is not necessarily as good in software translation as a technologist. The Danish translation team run the translations through the mailing lists, so the translation can be verified, and this is definitely a strength; this also means an editorial process which proprietary software companies can not afford.

  6. Diego Castro 26 May 2010 at 7:28 pm #

    Portuguese here too. But I don’t use any translated software… there is always something lost in translation. And try to search for a solution on the forums…. it’s a pain… cause the english one is always the best.

    • Sense Hofstede 26 May 2010 at 7:46 pm #

      Documentation and support are often better in English because there are more people maintaining it.

      I think this is not only caused by the fact that the English speaking community is larger than, say, the Welsh speaking. A very important reason is that people from a country where English isn’t the main language are less active in the community. Example: when I went to the UDS, in Belgium (close to the Netherlands) I met three to four Dutch people. One or two of them were Canonical employees. I’ve also met/seen three people from New Zealand, all three community members. That is from the other side of the world!

      I think there are simply relatively less non-native English speakers contributing to Ubuntu, in any area.

      • sil 27 May 2010 at 8:18 am #

        Is that not just because the translation isn’t very good, though? There’s no reason why there couldn’t be a very vibrant only Dutch-speaking Ubuntu community which is completely separate and disjoint from the existing English-speaking community, except that that’s hardly going to get started if the Dutch translation isn’t perfect?

        • Sense Hofstede 1 June 2010 at 5:21 pm #

          Well, the translations of the product (Ubuntu, in our case) are certainly important, but the documentation is as well. This means documentation for end-users, but also documentation for application developers and API references. But something that’s almost more important than the documentation is the localisation of the infrastructure. Example: the team roadmaps and team reports. The documentation explaining how to use the team roadmaps and how to create team reports is written in English because it is started in the main community — the place where stuff happens, and therefore the place where eventually everyone wants to be — and is not translated. The platforms (Launchpad and the Ubuntu Wiki) used to create the content are not translated. The content that should be created should be in English if you want the rest of the world (like the UWN and others) to pick it up. This means that you’ll basically have to do everything in English if you want to get seen by the rest of the world (the rest of the world also includes people that could help the LoCo and it’s just nice to get recognition). That means the English-speaking LoCos have an advantage here. The fact that the infrastructure is not translated is annoying and seriously hinders certain LoCo-members.

          The other main problem — which I’ve touched briefly in the first part of this reply — is that everything revolves about the international (read: English-speaking) community. Ubuntu started as a small group of people working on Ubuntu. Due to the origins of the people and for the sake of the convenience the conversational language was English. That makes sense. However, we now have an international community where the main language is English (which does make sense). In that international community all cool things happen. It is where the new, cool stuff for the next Ubuntu release is developed, it is where you can contribute to Ubuntu via e.g. bug triaging and packaging. And even areas like documentation are more or less a part of the international community when it is in English. A lot of people want to join this cool international community. I didn’t start in my LoCo, I started contributing in the international community because I found it much more interesting. This drains people, but it also means that certain people who most certainly have the capabilities and the will to contribute cannot because of the language barrier. Even if we would have perfect translations of everything we’d still see people leaving for the international community. They would have to anyway if they want to create NEW content, rather than just translating content written by others.

  7. Sense Hofstede 26 May 2010 at 7:42 pm #

    A great improvement that has come into use lately was support for fallback. E.g. for the local language in the province where I live the local language is FY_nl (Frisian), but the translations for that are rare(I don’t speak the language, but about a million people do). So the language falls back to NL_nl (Dutch) because all people here speak that language as well. The next fallback language is EN_gb (The fallback for all European langauges, iirc), the last is of course EN. That is really helpful because it enables people with less knowledge of the English to use an incomplete translation of their language. More use means more exposure, means more contributors.

    ‘The door in which the cat walks’ is a great example: sometimes it is hard to give the correct translation. I mean, most computer words in Dutch are already taken straight from English because there were no Dutch words for them and we are not like the French, making up our own. Especially when things get very technical and specific and you don’t have long strings it is very, very hard to give a proper Dutch translation.

    I do find that the Dutch translation in Ubuntu feels more Dutch than the translation in Windows. Example: Windows translates ‘Print’ with ‘Printen’ (to print), but Ubuntu translates to ‘Afdrukken’. It means the same, but ‘Afdrukken’ is not a loan word. Also notice the difference between the form of the verb: English has ‘print’, Dutch always uses forms like ‘to print’ for actions.

    The Dutch translation of Windows does feel more professional than the Dutch translation in Ubuntu.

    • MadJo 28 May 2010 at 9:46 am #

      I’m afraid that you are wrong, Sense. I’m behind a Dutch Windows XP machine right now, and it says “Afdrukken” in the “Bestand” menus and not “Printen”. IMO, it’s a very poorly translated product if it uses “Printen” instead of “Afdrukken”.

  8. conor.hogan.2 26 May 2010 at 9:58 pm #

    Open source is excellent for Irish and even xp and osx is translated into Irish.

    Firefox and openoffice are ones I use daily – very nice.

    I just participated in a thread (http://boards.ie/vbulletin/showthread.php?t=2055918774 – if you want to google translate it) about Linux in Irish.

    Languages are one of the reasons I love and support open source.

  9. conor.hogan.2 26 May 2010 at 10:35 pm #

    After listening and reading the comments a few more thoughts.

    I find it interesting that a few languages find the most difficult part the finding of words especially since the romance languages can work off each other, would that not make it easier along with the fact they have millions and millions of speakers in comparision to irish, welsh or breton or basque etc?

    the cat flap in irish is ”door of cat/cat door” ”comhla an chait” but most things translated do not work directly as languages are often so different.

  10. Sodki 27 May 2010 at 11:33 am #

    Another portuguese here.

    Software translation in Portugal’s portuguese (pt_PT) has a really bad legacy reputation because the majority of portuguese speakers are from Brazil (pt_BR) and, therefore, most companies only translated software into pt_BR. pt_PT and pt_BR are very different and, to top it off, most translations weren’t that good anyway – the portuguese version of Windows 95 and 98 was mind-blowing crap and using it really affected the stability of the system.

    Just so that I can make this clear: pt_PT and pt_BR are so different that most people just don’t use the portuguese version of Wikipedia because it’s written mostly by brazilians and, therefore, it’s utter crap to us. :)

    Because of all that, people just tended to ignore or avoid the portuguese versions of software. Fortunately, this is not the case anymore. More and more companies are providing pt_PT translations of their software and the quality of the translations are increasing too.

    Regarding Free Software, I mainly use english, but I like to use pt_PT too just to see where we are standing – and the quality is really, really good nowadays. I could use the pt_PT version of GNOME any day and feel at home. Translation and localization are an extremely important goal of Free Software and we are getting there at a steady pace.

    I only wish we could harness the power of the education system and put some student time on good use. English assignments could be viewed differently if they affected millions of people. :)

    • conor.hogan.2 27 May 2010 at 7:24 pm #

      Hmmmm.

      Surely it is closer to Portuguese than say Spanish?

      It has the same structure, just variant words and idioms etc – completely understandable, no?

      • Sodki 29 May 2010 at 12:26 pm #

        Yes, but it’s not like the difference between US English and UK English. It’s way more that that. We can understand each other in many ways, but it’s becoming increasely difficult.

        • conor.hogan.2 29 May 2010 at 1:03 pm #

          Okay.

          So is it harder to understand than spanish (spain)?

  11. LN2 27 May 2010 at 1:07 pm #

    Being fairly fluent in two languages (German and English), I just wanted to add my thoughts.

    First, I really don’t think paying ‘professionals’ to translate free software would necessarily improve the quality of the translations. Believe me I have read my fair share of university text books (and I am talking about books that cost 50 Euros or more), whose translation contained mistakes, a first year student in that subject would have been very unlikely to have made. Basically the translations sucked, even though the person who did it was in all likelihood paid to do it. If paying people to do a job was all that is needed to get it decently done, I guess a lot of proprietary software would suck a lot less ;-) .

    Second, as mjjzf has already brought up, sometimes translating some strings is made unnecessarily difficult, because the text string alone does not provide enough context to come up with the correct, or best, or most idiomatic translation. I think it is already possible for developers to provide additional information for a string for the translators, but unfortunately (at least from my look on launchpad’s rosetta) this feature is rather underused.

    Third, this might be a personal taste issue, but I think often translators in free software translate too much. I really see know point in translating command line stuff, the function of programming languages are not translated either. Commands you have to type running the command line are either English or at least derived from English words. A lot of the lower level error messages are not translated either, so all you end up with is a mixture between English and your local language, which in my opinion is worse than just simply having everything in English. Sometimes at least in my opinion less translation would be more.

  12. Hezy 27 May 2010 at 11:05 pm #

    I guess there rare languages that benefit from the availability of free software, but when it comes to Hebrew I must say that Microsoft is doing a very good job in the translation department. This is not a trivial thing – Hebrew (like Arabic and Farsi) is written from right to the left. Translation is not just a matter of changing strings, but a change in the way text and UI are rendered. The problem is more profound than you might guess – numbers are still written left to right, and you have to support the ability to switch to left to right languages even in the middle of a sentence. Microsoft was early in this game, and as a result got 99.99% of the OS market. Today Linux is fine with Hebrew, but still new releases of major free software like OpenOffice revives bugs related to Hebrew that were already fixed in earlier versions.

  13. Busby 28 May 2010 at 12:22 am #

    So, the consensus here seems to be that crowd sourced translations of big open source projects are of a higher quality on a string-by-string basis than their proprietary equivalents (Ubuntu vs Windows, OpenOffice vs MS Office). Of course this makes sense because the people doing the translating are the people using the product.

    Jono raised an interesting point when he said that good/fairly complete translations of open source projects exist in areas where proprietary translations do not exist. Is the opposite also true? Are crowd sourced translations more scarce in areas where good proprietary translations are available? If so, is this effect inherent to our model, or are there ways around it?

  14. MadJo 28 May 2010 at 9:50 am #

    I have contributed to a few translation projects in the past, and very curious about Rosetta, hadn’t heard of it before. Now, personally I tend to use the English version anyway, but that doesn’t mean that other Dutch people should be denied access to decent translations. I am by no means a professional translator (I test software for a living), but I do know my way around the languages. And indeed the biggest hurdle is the stuff that doesn’t have an analogue in the destination language. But usually there is a way around that. For instance by keeping the original word.

  15. Ramón 28 May 2010 at 5:20 pm #

    Ubuntu user from Spain here. I just want to say that Spanish translation is stunning. Everything in the GNOME desktop is perfectly translated. I’ve rarely seen weird translations or not translated chains in default applications, not even in most popular but not default applications (eg. Inkscape). You have to install quite new or very specific, not so popular software to see bad or untranslated stuff.

    I could use my system in English since I understand it quite well (the proof is that I’m one of your faithful listeners), but I feel very comfortable using Ubuntu in Spanish on a daily basis.

  16. Shane Fagan 29 May 2010 at 11:00 am #

    Im the Irish translations manager for Ubuntu but the main problem is that no one speaks Irish well enough :) Id love if someone translated Ubuntu into pirate English or Klingon.

  17. Gerv 3 June 2010 at 1:00 pm #

    Firefox 3.6 is available in 74 languages or dialects: http://www.mozilla.com/en-US/firefox/all.html including every single language and dialect mentioned in these comments so far. Even Esperanto.

    Gerv

  18. Gerv 3 June 2010 at 1:00 pm #

    Oops, sorry, didn’t notice Shane mentioned Klingon and Pirate. We don’t have those. Yet.

    Gerv

  19. Amy 5 June 2010 at 1:42 am #

    Hmmmm.

    Surely it is closer to Portuguese than say Spanish?

    It has the same structure, just variant words and idioms etc – completely understandable, no?

  20. Prosthetic Head 6 June 2010 at 10:44 pm #

    I’m guessing here, but if the more commons strings are translated first then 50% complete translation may well mean you see 99.5% of what you actually see day to day.


Leave a Reply