We’re releasing a full suite of models and one dataset. Built on the foundation of FAIR’s previous research, Omnilingual ASR gives stakeholders everything they need to expand and improve speech technology for any language.
The two decoding variants are available as a versatile family of models — from lightweight 300M versions designed for low-power devices to powerful 7B models that offer top-tier accuracy for a variety of use cases. Our general-purpose speech foundation model wav2vec 2.0 is also made available at various sizes. It can be used by researchers and developers alike to enable speech-related tasks beyond ASR.
All assets are released under a permissive Apache 2.0 license while the data is provided under the CC-BY license and are based on FAIR’s open source fairseq2 framework, empowering researchers, developers, and language advocates worldwide to advance and tailor speech solutions for their own use cases using the latest tools and technologies in the PyTorch ecosystem.
Omnilingual ASR also advances the state of multilingual ASR along more familiar dimensions. Its training corpus is one of the largest ever assembled for ASR in both volume and linguistic diversity, integrating publicly available datasets with community-sourced speech recordings collected through multiple partnerships.
To reach languages with little or no digital presence, we worked with local organizations that recruited and compensated native speakers, often in remote or under-documented regions. We’re releasing this commissioned part of our training corpus as Omnilingual ASR Corpus to further benefit the ASR research community. To date, it is the largest ultra-low-resource spontaneous ASR dataset ever made available, covering hundreds of languages never seen before by ASR systems. Explore the languages in the dataset here.
Beyond commissioned partnerships, collaborations through the Language Technology Partner Program have brought together linguists, researchers, and language communities from around the world, providing essential expertise and resources. We joined forces with organizations such as Mozilla Foundation’s Common Voice and Lanfrica/NaijaVoices to work directly with local communities.
These partnerships have been instrumental in infusing Omnilingual ASR with deep linguistic knowledge and cultural understanding, ensuring that the technology meets local needs and empowers diverse language communities globally.