Meta introduced its Massively Multilingual Speech (MMS), a brand new AI language model that can identify over 4,000 languages around the world and produce text-to-speech and speech-to-text to more than 1,100 of those languages. This technology aims to preserve language diversity and help people get information and use their devices in their preferred language.
What’s interesting is that the tech giant used the Bible and other religious texts in building its language database. With more than 7,000 global languages, only about 100 are used in existing speech recognition models giving a huge shortage to people, especially those speaking indigenous languages.
We envision a world where technology has the opposite effect, encouraging people to keep their languages alive since they can access information and use technology by speaking in their preferred language. —Meta
Meta tackled the first challenge of collecting audio data for thousands of languages. They used the Bible which is a widely translated manuscript and is used for studying text-based language translation research. “These translations have publicly available audio recordings of people reading these texts in different languages,” the company said. In addition to using readings of the New Testament, Meta incorporated unlabeled recordings of other Christian readings that increased the number of available languages to 4,000.
The company reassured that, “while the content of the audio recordings is religious, our analysis shows that this doesn’t bias the model to produce more religious language.” Also, the experts noted that the MMS AI research models didn’t prefer the male voice over the female depsite most audio Bibles being recorded by men. It also reminded the public that the models are imperfect. “For example, there is some risk that the speech-to-text model may mistranscribe select words or phrases,” Meta wrote. “Depending on the output, this could result in offensive and/or inaccurate language.”
Translating languages, especially the indigenous ones, is a costly and time-consuming process. But, Bible societies are working hard to provide accurate translations of Scripture to reach more audience and get them engaged with Bible reading and knowing more about the Lord in a language that speaks to them.
The United Bible Societies (UBS) reported that they were able to produce a record number of new translations in one year—57 translations of the Bible (or parts of it), in 2022. This meant 100 million people were able to read the Bible for the first time in their native language. However, there are still nearly 4,000 languages waiting to have their own translation of the Bible.
Meta is sharing its models and code to the public to encourage collaboration within the research community and expand the work they have done. It plans to support more languages, including dialects, which often poses more challenges that affect the translation process.
“We envision a world where technology has the opposite effect, encouraging people to keep their languages alive since they can access information and use technology by speaking in their preferred language,” the company said.
Many of the languages in the world are disappearing due to various causes and Meta’s newest technology aims to counter this trend. Translating languages make information, such as the Bible, accessible to everyone. “The Massively Multilingual Speech project presents a significant step forward in this direction.”