元语言ASR:推进1600种语言的自动语音识别
Omnilingual ASR: Advancing automatic speech recognition for 1600 languages

原始链接: https://ai.meta.com/blog/omnilingual-asr-advancing-automatic-speech-recognition/?_fb_noscript=1

FAIR 发布了一套全面的开源模型和“全语言自动语音识别语料库”,以推进所有语言的语音技术。这包括基于 wav2vec 2.0 构建的、从轻量级到高精度 7B 版本的通用自动语音识别 (ASR) 模型,并采用宽松的许可协议(Apache 2.0 & CC-BY)。 一个关键重点是扩展 ASR 到代表性不足的语言。发布的数据集是迄今为止创建的最大规模的超低资源自发 ASR 数据集,涵盖了数百种以前不受支持的语言,这得益于与当地组织和偏远地区的母语人士的合作。 与 Mozilla 的 Common Voice 和 Lanfrica/NaijaVoices 等组织的合作确保了语言的准确性和文化相关性。该举措使研究人员、开发人员和语言社区能够利用最新的 PyTorch 工具,在全球范围内构建和定制语音解决方案。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交 登录 Omnilingual ASR:推进1600种语言的自动语音识别 (meta.com) 10 分,由 jean- 2小时前 | 隐藏 | 过去的 | 收藏 | 2 评论 tschellenbach 0分钟前 | 下一个 [–] 关于延迟有什么见解?回复 meetpateltech 2小时前 | 上一个 [–] HF Demo: https://huggingface.co/spaces/facebook/omniasr-transcription... GitHub: https://github.com/facebookresearch/omnilingual-asr 回复 考虑申请YC冬季2026批次!申请截止至11月10日 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

We’re releasing a full suite of models and one dataset. Built on the foundation of FAIR’s previous research, Omnilingual ASR gives stakeholders everything they need to expand and improve speech technology for any language.

The two decoding variants are available as a versatile family of models — from lightweight 300M versions designed for low-power devices to powerful 7B models that offer top-tier accuracy for a variety of use cases. Our general-purpose speech foundation model wav2vec 2.0 is also made available at various sizes. It can be used by researchers and developers alike to enable speech-related tasks beyond ASR.

All assets are released under a permissive Apache 2.0 license while the data is provided under the CC-BY license and are based on FAIR’s open source fairseq2 framework, empowering researchers, developers, and language advocates worldwide to advance and tailor speech solutions for their own use cases using the latest tools and technologies in the PyTorch ecosystem.

Omnilingual ASR also advances the state of multilingual ASR along more familiar dimensions. Its training corpus is one of the largest ever assembled for ASR in both volume and linguistic diversity, integrating publicly available datasets with community-sourced speech recordings collected through multiple partnerships.

To reach languages with little or no digital presence, we worked with local organizations that recruited and compensated native speakers, often in remote or under-documented regions. We’re releasing this commissioned part of our training corpus as Omnilingual ASR Corpus to further benefit the ASR research community. To date, it is the largest ultra-low-resource spontaneous ASR dataset ever made available, covering hundreds of languages never seen before by ASR systems. Explore the languages in the dataset here.

Beyond commissioned partnerships, collaborations through the Language Technology Partner Program have brought together linguists, researchers, and language communities from around the world, providing essential expertise and resources. We joined forces with organizations such as Mozilla Foundation’s Common Voice and Lanfrica/NaijaVoices to work directly with local communities.

These partnerships have been instrumental in infusing Omnilingual ASR with deep linguistic knowledge and cultural understanding, ensuring that the technology meets local needs and empowers diverse language communities globally.

联系我们 contact @ memedata.com