Meta today announced a Massively Multilingual Speech (MMS) recognition model that supports speech-to-text and text-to-speech for 1,107 languages and language identification for over 4,000 languages. As per Meta’s own evaluation, models trained on its Massively Multilingual Speech data achieve half the word error rate when compared to OpenAI’s Whisper, but it covers 11 times more languages. So, Meta’s new model can perform very well compared with the best current speech models.

Why is Meta working on this project?

existing speech recognition models only cover approximately 100 languages — a fraction of the 7,000+ known languages spoken on the planet. Even more concerning, nearly half of these languages are in danger of disappearing in our lifetime.

Meta wants to increase the language coverage and also tackle the challenge of handling dialects in the future.

Meta is also open sourcing its models and code so that people in the research community can build on it to preserve the world’s languages.