Cautioning on the perils of machine translation, especially when expectations surpass achievable results, Michael Bauer on his Dear Developer blog posts a letter he has written to the Conservative MSP Murdo Fraser following the latter’s suggestion that Scottish (Scottish Gaelic) be added to the list of languages available on Google Translate:
“I’m sure that this is a well-intentioned idea but in my professional opinion, it would have terrible consequences. As one of the few people who work entirely in the field of Gaelic IT, I have a keen interest in technology and the potential benefit – and damage – this offers to languages like Gaelic. As it happens, I also was the Gaelic localizer (i.e. translator) for Google when it was still running the Google In Your Language programme and I have watched (often with dismay) what Google has done in this are since. One of the projects that certainly caught my eye was Google Translate, especially when Irish was added as a language in 2009. But having spoken to Irish people working in this field and having watched the effects of it on the Irish language, I rapidly came to the conclusion that while it looks ‘cool’, being on a machine translation system for a small(er) language was not necessarily a benefit and in some cases, a tragedy.
Without going into too much technical detail, machine translation of the kind that Google does works best with the following ingredients:
– a massive (billions of words) aligned bilingual corpus
– translation between structurally similar languages or
– translation from a grammatically complex language into a less grammatically complex language but not the other way round
– translation of short, non-colloquial phrases and sentences but not complex, colloquial or literary structures
In essence, machine translation trains an algorithms in ‘patterns’, which is why massive amounts of data are needed and why it works better from a complex language into a less complex language.
Unfortunately for Irish, none of these conditions were met – and would also not be met for Scottish Gaelic. To begin with, even if we digitized all the works ever produced which exist in English and Gaelic, the corpus would still be tiny by comparison to the German/English corpus for example.
Then there is the issue of linguistic distance, Irish/Gaelic and English are structurally very different, with Gaelic/Irish having a lot more in the way of complex grammatical structures than English.
Whatever the intentions of the developers, people will misuse such a system. I have put together a few annotated photos which illustrate the scale of the disaster in Ireland here. From school reports to official government websites, there are few places where students, individuals or officials trying to cut corners have not used Irish translations of Google Translate in ways they were not intended to be used.
I think we can all agree that the last thing Gaelic needs is masses of poor quality translations floating around the internet. Funding is extremely short these days and this would, in my view, be a poor use of these scarce funds. There are more pressing battles to be fought in the field of Gaelic and IT, such as the refusal by the 3rd party suppliers of IT services to Gaelic schools and units to provide (existing) Gaelic software or even a keyboard setting in any school that allows students to easily input accented characters, be that for Gaelic, Spanish or French.”
All good points though you should read the full letter here. To these objections has been added the voice of John Storey, who has worked with the Gaelic Books Council and knows a thing or two about this issue:
“Without sounding too dramatic, may I suggest that, far from being positive, the potential exists for Google Translate to have a very negative impact on the future well-being of Gaelic. Many advocates of Google Translate forget that the current Google set-up favours English and other majority languages. Many of us argue that Google Translate actually exacerbates the dominance of majority languages such as English. Having the ability to switch from a minoritized language (in this case, Scottish Gaelic) to a majority language, English, can potentially encourage some interest and appreciation of the target language, but more often than not this will only encourage learning at a very basic level.
Another key area of concern is that of translation itself: standards of translation, as well as the well-being of the fragile but vibrant Gaelic translation sector in Scotland. I have worked in the Gaelic publishing industry for a number of years now and have had regular dealings with a variety of translators throughout the country.
No doubt Google Translate’s technology, software and performance will improve in future years, but you will never match the quality, subtlety and sensitivity available through a skilled human translator.
The danger is that many non-Gaelic individuals and groups – including companies who are asked to translate into Scottish Gaelic (perhaps, for example, as a result of the Gaelic Language (Scotland) Act) – will use Google Translate as the ‘easy option’. [ASF: As we know in Ireland from some recent scandals which caused considerable public outrage]
I would, with the greatest respect, ask you to reconsider. Technology is vital to the future of minoritized languages such as Gaelic. However, Google Translate and Gaelic should not be a priority at the moment: there are many more pressing needs with regard to our language and technology. For example, the national Gaelic development body, Bòrd na Gàidhlig, could be supported to address the shortage of technology and computer teaching staff in Gaelic medium education, particularly at Secondary School level; they could be allowed to encourage increased investment in Gaelic gaming and the software industry; they could be looking at practical measures to increase usage with regard to mobile technology and Gaelic; and so forth. There are a host of other technological needs.”
Personally I have to agree that standardised “language localisations” of software and technology are more deserving of funding and development by governments in Ireland and Scotland than online translation programs of dubious quality or worth. Irish-speakers and Scottish-speakers already exist. What they require are services and goods in their own languages, something that will of course also benefit new speakers of the Gaelic dialects. Establishing a consistent linguistic milieu across several technological platforms, from PCs to smartphones, TVs to Bluray players, Windows to OS X, Facebook to Snapchat, should be the primary objective in this area if state support is to be involved.