I hear you Floresco Images/Getty
Googleâs latest take on machine translation could make it easier for people to communicate with those speaking a different language, by translating speech directly into text in a language they understand.
Machine translation of speech normally works by first converting it into text, then translating that into text in another language. But any error in speech recognition will lead to an error in transcription and a mistake in the translation.
Researchers at Google Brain, the tech giantâs deep learning research arm, have turned to neural networks to cut out the middle step. By skipping transcription, the approach could potentially allow for more accurate and quicker translations.
Advertisement
The team trained its system on hundreds of hours of Spanish audio with corresponding English text. In each case, it used several layers of neural networks – computer systems loosely modelled on the human brain – to match sections of the spoken Spanish with the written translation. To do this, it analysed the waveform of the Spanish audio to learn which parts seemed to correspond with which chunks of written English. When it was then asked to translate, each neural layer used this knowledge to manipulate the audio waveform until it was turned into the corresponding section of written English.
Corresponding patterns
âIt learns to find patterns of correspondence between the waveforms in the source language and the written text,â says at the University of Montreal in Canada, who wasnât involved with the work.
After a learning period, Googleâs system produced a better-quality English translation of Spanish speech than one that transcribed the speech into written Spanish first. It was evaluated using the BLEU score, which is designed to judge machine translations based on how close they are to that by a professional human.
The system could be particularly useful for translating speech in languages that are spoken by very few people, says at the University of Edinburgh in the UK.
International disaster relief teams, for instance, could use it to quickly put together a translation system to communicate with people they are trying to assist. When an earthquake hit Haiti in 2010, says Goldwater, there was no translation software available for Haitian Creole.
Goldwaterâs team is using a similar method to translate speech from Arapaho, a language spoken by only 1000 or so people in the Native American tribe of the same name, and Ainu, a language spoken by a handful of people in Japan.
Rare languages
The system could also be used to translate languages that are rarely written down, since it doesnât require a written version of the source language to produce successful translations.
Until it is tested on a much larger dataset, itâs hard to tell how the new approach really compares with more conventional translation systems, says Goldwater. But she thinks it could set the standard for future machine translation.
Some services already use machine translation to let people who speak different languages have conversations in real time. Skype introduced a feature in 2014 and now supports nine languages, including Mandarin and Arabic as well as the most common European languages. But like other existing translation methods, Skypeâs transcribes speech into text before translating that text into a different language.
And text translation service Google Translate already uses neural networks on its most popular language pairs, which lets it analyse entire sentences at once to figure out the best written translation. Intriguingly, this system appears to use an âinterlinguaâ â a common representation of sentences that have the same meaning in different languages – to translate from one language to another, meaning it could translate between a language pair it hasnât explicitly been trained on. The Google Brain researchers suggest the new speech-to-text approach may also be able to produce a system that can translate multiple languages.
But while machine translation keeps improving, itâs difficult to tell how neural networks are coming to their solutions, says Bahdanau. âItâs very hard to understand whatâs happening inside.â
arXiv
Topics:



