How it all started
The first significant reflections on machine translation (MT) date back to the late 1940s with the work of Warren Weaver, an American scientist and mathematician who wanted to use his knowledge of cryptography, statistics and language universals to create a computer able to translate human languages. Weaver’s work led to the first public machine translation experiment in 1954, organized by IBM: it involved translating some Russian sentences into English using a restricted vocabulary of 250 words and only six grammatical rules converted into program code. The experiment drew the attention of the mass media, leading to further research being financed. In spite of the doubts about the authenticity of the translation, expectations were high after the experiment and it was thought that within a few years, there would be computers able to quickly translate many of the world’s languages and to make international communication easier.
However, we had to wait until the 1990s to see the expansion of MT systems in commercial agencies, government services and multinational companies as well as the emergence of translation workstations. Eventually, in 2006, we also witnessed the dawn of Google Translate.
How Google Translate adds simplicity and convenience to the MT debate
Google Translate was originally introduced in 2006 for Arabic. Today, it is the most popular machine translation service available online. Many of us are very familiar with the two-column Google Translate display. You simply enter a text in a specific source language in the left column and get a translation in your desired target language with one simple click.
Screenshot via Supertext
Google Translate can even convert non-Latin scripts into the Latin alphabet, and there is also an app for smartphones. It offers the traditional translation tool, the option of speaking words and phrases to get translations and to listen to them, and a conversation mode. Moreover, you can save your favorite translations and use them in the future. In addition, you can take pictures of messages or writing to get the corresponding translation. The quality of the translations produced by Google is sometimes poor, but the growing number of users indicates that a latent demand is being satisfied.
So how does Google Translate work from a technical point of view? The system is based on “statistical machine translation”, the kind of MT that works on patterns found in large text corpora. When we learn a language, we usually focus on vocabulary and grammatical rules. Computers can also learn a language in this way, but linguistic exceptions are a demanding obstacle. For this reason, Google let computers discover the rules by analyzing already translated texts such as books, documents produced by organizations, websites, laws, and articles. Computers scan the texts and look for statistically significant patterns between original texts and translations that are unlikely to occur by chance. If the MT system finds a pattern, this pattern can be used in the future to translate similar texts.
Google Translate and its love story…
The more data available for a language or for language pairs, the better the translations by Google Translate. Dr. Ashish Venugopal, research scientist and developer of Google Translate, has a funny story about this. He was trying to win over his future wife, whose mother tongue is Hindi. Dr. Venugopal cannot speak Hindi but found a website with Hindi songs translated into English. Thanks to those translations, he managed to reconstruct standard sentences in Hindi and got his happily ever after.
…and why there is also food involved
The aim of the statistical approach is to produce an output (even with faults) that will always work. Dr. Venugopal uses another example to help explain these principles. Think of a Chinese restaurant. We don’t know the grammatical rules of Chinese, but we have the English translation of the dish “sweet and sour beef”. We can then write down the two corresponding Chinese words for “sweet and sour” and “beef”. Later on, we read the English translation of another dish, i.e. “sweet and sour vegetables”, and we see the same Chinese word for “sweet and sour”. Then, we read “vegetable soup” and we see the Chinese word for “vegetable”, as learnt in the previous step. Now we should be able to predict the Chinese expression for “sweet and sour vegetable” without looking at the English translation. Statistical machine translation works more or less in this way.
And if you’re now hungry and want to order dinner at your favorite Chinese restaurant – will you do it in Chinese? Why not give it a try? Otherwise just ask MT (or your favorite LSP) for help.
Cover image via Shutterstock