Babel was a city (now thought to be Babylon), where legend has it, the people attempted to build a tower that would reach into heaven. As this was an enormous task, it required much time and cooperation among the people who incidentally all spoke the same language.
Hearing of this endeavour, God is said to have come down to see the city and declared, “Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do. Let us go down, and there confound their language, that they may not understand one another’s speech.”
And so God confounded the attempts of the builders by confusing their language into many mutually incomprehensible languages. Soon discord arose, the tower was left unfinished and the people of Babel scattered across the world.
Whatever your theological belief may be, this story is an interesting allegory. While used as a reason for the existence of the numerous languages in the world, it also illustrates how differences in language often lead to loss of cohesion.
The Babel fish
In his entertaining novel The Hitchhiker’s Guide to the Galaxy, science-fiction writer Douglas Adams came up with an unusual solution to the problem of understanding multiple languages across the universe – the Babel fish.
Described as “small, yellow, leech like and probably the oddest creature in the universe the Babel fish feeds on the energy of brain waves around it, and excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with nerve signals picked up from the speech centres of the brain which has supplied them.
The practical upshot of all this is that if you stick a Babel fish in your ear, you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish.”
First steps towards a Babel fish
While text and voice based translation applications have been around for a while, NTT Docomo made a giant leap late last year with the launch of an Android based voice translator for phone calls – the Hanashite Hon’yaku app. This app provides voice translation of the other speaker’s conversation into a required language as well as providing text readout.
The free service is already being used by Docomo customers, with translations possible on any smartphone, because the app utilises Docomo’s cloud servers for processing. However, the user must be a subscriber to one of Docomo’s packages to be able to use the service, so sadly it is not available on other operator networks.
Docomo will soon face competition from France’s Alcatel-Lucent which is developing a rival call translation product, named WeTalk. The service is to be compatible over any landline and is said to be able to handle Japanese and about a dozen other languages including English, French and Arabic. The firm said all this could be done in less than a second. However, it has opted to wait before the speaker has stopped talking to start the translation after trials suggested that users preferred the experience.
These applications are far from perfect, with errors occurring due to inability to recognize various accents and nuances in a language. The best voice translators typically have an error rate of 20-25%, which is just not good enough especially in business environments.
Microsoft Research and the University of Toronto made a breakthrough in improving translations by using a technique called Deep Neural Networks, which is patterned after human brain behaviour. The researchers were able to train more discriminative and better speech recognizers than previous methods.[ 1 ]
Back in October 2012, Microsoft researchers demonstrated software that translates spoken English into spoken Chinese almost instantly, while preserving the tone of a speaker’s voice – an innovation that makes conversation more effective and personal.
The demonstration was made by Rick Rashid, Microsoft’s chief research officer, at an event in Tianjin, China. “I’m speaking in English and you’ll hear my words in Chinese in my own voice,” Rashid told the audience. The system works by recognizing a person’s words, quickly converting the text into properly ordered Chinese sentences, and then handing those over to speech synthesis software that has been trained to replicate the speaker’s voice. [ 2 ]
As Rashid explains in the Microsoft blog, “it required a text to speech system that Microsoft researchers built using a few hours speech of a native Chinese speaker and properties of my own voice taken from about one hour of pre-recorded (English) data, in this case recordings of previous speeches I’d made.”
As IBM’s Jeopardy champ “Watson” has shown, with enough information computers using neural networks can identify puns and wordplay in languages and learn to respond to questions involving them.
With further improvement in translation technologies, real time perfect translations could move from science fiction to science fact in the very near future. Wearable technology such as Google glass may soon be able to incorporate real time translation using cloud based services.
In a country where linguistic differences have and still affect such a significant portion of the population, such translation applications could be very useful. There should be some significant effort and backing put into developing translation services in the local market – an example of which is the website translation services developed by Dialog which, however, is only available for English to Sinhala translations. It is a small step but should be used as motivation for local developers to get involved in including Sinhala and Tamil to existing translation technologies – voice and text – in order to create applications that can help break language barriers.
While not so fantastical as a tower that reaches heaven, we may soon be able to embark on the next great project, which will hopefully help in understanding one another a little bit better in the future.