请提供需要翻译的内容。 (Please provide the content to be translated.)

请提供需要翻译的内容。 (Please provide the content to be translated.)
Voxtral Transcribe 2

原始链接: https://mistral.ai/news/voxtral-transcribe-2

## Voxtral Transcribe 2：下一代语音转文本模型发布 Mistral AI 发布了 Voxtral Transcribe 2，这是一系列先进的语音转文本模型，专为批量和实时应用设计。**Voxtral Mini Transcribe V2** 在准确性和成本效益方面表现出色，适用于会议记录等任务，提供说话人分段、上下文偏差和词级别时间戳，支持 13 种语言，并以每分钟 0.003 美元的成本实现最低的词错误率。 **Voxtral Realtime** 针对低延迟实时转录进行了优化（可配置至低于 200 毫秒），非常适合语音助手和对话式人工智能。值得注意的是，它以 Apache 2.0 许可的**开放权重**形式发布，支持注重隐私的边缘部署。两种模型均支持 13 种语言，并展示出优于 GPT-4o 和 Gemini 等竞争对手的卓越性能。用户可以通过 Mistral Studio 中的新音频游乐场即时测试 Voxtral Transcribe 2 的功能，包括说话人分段和时间戳。这些模型旨在改变包括呼叫中心、媒体和合规性在内的各个行业的应用，并提供企业级功能和安全的部署选项。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 工作 | 提交登录 Voxtral Transcribe 2 (mistral.ai) 26 分，由 meetpateltech 发表于 54 分钟前 | 隐藏 | 过去 | 收藏 | 3 条评论 observationist 发表于 8 分钟前 | 下一个 [–] 原生二元化，这看起来很令人兴奋。https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-26...~9GB 模型。回复 serf 发表于 8 分钟前 | 上一个 [–] 我讨厌的事情： “点击试用！”横幅，链接到一个警告页面，上面写着“哦，只有付费会员，糟糕！” 所以，你不是说“试用一下”，而是说“购买这个产品”。别装作是免费样品。我无法评论这个模型：我不会给他们钱。回复 ReadEvalPost 发表于 4 分钟前 | 父级 [–] 你可以在 HF 上试用：https://huggingface.co/spaces/mistralai/Voxtral-Mini-Realtim...回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

Today, we're releasing Voxtral Transcribe 2, two next-generation speech-to-text models with state-of-the-art transcription quality, diarization, and ultra-low latency. The family includes Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. Voxtral Realtime is open-weights under the Apache 2.0 license.

We're also launching an audio playground in Mistral Studio to test transcription instantly, powered by Voxtral Transcribe 2, with diarization and timestamps.

Highlights.

Voxtral Mini Transcribe V2: State-of-the-art transcription with speaker diarization, context biasing, and word-level timestamps in 13 languages.
Voxtral Realtime: Purpose-built for live transcription with latency configurable down to sub-200ms, enabling voice agents and real-time applications.
Best-in-class efficiency: Industry-leading accuracy at a fraction of the cost, with Voxtral Mini Transcribe V2 achieving the lowest word error rate, at the lowest price point.
Open weights: Voxtral Realtime ships under Apache 2.0, deployable on edge for privacy-first applications.

Voxtral Realtime.

Voxtral Realtime is purpose-built for applications where latency matters. Unlike approaches that adapt offline models by processing audio in chunks, Realtime uses a novel streaming architecture that transcribes audio as it arrives. The model delivers transcriptions with delay configurable down to sub-200ms, unlocking a new class of voice-first applications.

Fleur Voxtral 2

Word error rate (lower is better) across languages in the FLEURS transcription benchmark.

At 2.4 seconds delay, ideal for subtitling, Realtime matches Voxtral Mini Transcribe V2, our latest batch model. At 480ms delay, it stays within 1-2% word error rate, enabling voice agents with near-offline accuracy.

The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs efficiently on edge devices, ensuring privacy and security for sensitive deployments.

We’re releasing the model weights under Apache 2.0.

Voxtral Mini Transcribe V2.

Voxtral 2.0 Avg Diarization Error Rate Priceper Min

Average diarization error rate (lower is better) across five English benchmarks (Switchboard, CallHome, AMI-IHM, AMI-SDM, SBCSAE) and the TalkBank multilingual benchmark (German, Spanish, English, Chinese, Japanese).

Voxtral 2.0 Transcription Performance Fleurs Priceper Min

Average word error rate (lower is better) across the top-10 languages in the FLEURS transcription benchmark.

Voxtral Mini Transcribe V2 delivers significant improvements in transcription and diarization quality across languages and domains. At approximately 4% word error rate on FLEURS and $0.003/min, Voxtral offers the best price-performance of any transcription API. It outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy, and processes audio approximately 3x faster than ElevenLabs’ Scribe v2 while matching on quality at one-fifth the cost.

Enterprise-ready features.

Voxtral Mini Transcribe V2 introduces key capabilities for enterprise deployments.

Speaker diarization.

Generate transcriptions with speaker labels and precise start/end times. Ideal for meeting transcription, interview analysis, and multi-party call processing. Note: with overlapping speech, the model typically transcribes one speaker.

Context biasing.

Provide up to 100 words or phrases to guide the model toward correct spellings of names, technical terms, or domain-specific vocabulary. Particularly useful for proper nouns or industry terminology that standard models often miss. Context biasing is optimized for English; support for other languages is experimental.

Word-level timestamps.

Generate precise start and end timestamps for each word, enabling applications like subtitle generation, audio search, and content alignment.

Expanded language support.

Like Realtime, this model now supports 13 languages: English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. Non-English performance significantly outpaces competitors.

Noise robustness.

Maintains transcription accuracy in challenging acoustic environments, such as factory floors, busy call centers, and field recordings.

Longer audio support.

Process recordings up to 3 hours in a single request.

FlEURS

Word error rate (lower is better) across languages in the FLEURS transcription benchmark.

Audio playground.

Test Voxtral Transcribe 2 directly in Mistral Studio. Upload up to 10 audio files, toggle diarization, choose timestamp granularity, and add context bias terms for domain-specific vocabulary. Supports .mp3, .wav, .m4a, .flac, .ogg up to 1GB each.

Transforming voice applications.

Voxtral powers voice workflows in diverse applications and industries.

Meeting intelligence.

Transcribe multilingual recordings with speaker diarization that clearly attributes who said what and when. At Voxtral's price point, annotate large volumes of meeting content at industry-leading cost efficiency.
Voice agents and virtual assistants.

Build conversational AI with sub-200ms transcription latency. Connect Voxtral Realtime to your LLM and TTS pipeline for responsive voice interfaces that feel natural.
Contact center automation.

Transcribe calls in real time, enabling AI systems to analyze sentiment, suggest responses, and populate CRM fields while conversations are still happening. Speaker diarization ensures clear attribution between agents and customers.
Media and broadcast.

Generate live multilingual subtitles with minimal latency. Context biasing handles proper nouns and technical terminology that trip up generic transcription services.
Compliance and documentation.

Monitor and transcribe interactions for regulatory compliance, with diarization providing clear speaker attribution and timestamps enabling precise audit trails.

Both models support GDPR and HIPAA-compliant deployments through secure on-premise or private cloud setups.

Get started.

Voxtral Mini Transcribe V2 is available now via API at $0.003 per minute. Try it now in the new Mistral Studio audio playground or in Le Chat.

Voxtral Realtime is available via API at $0.006 per minute and as open weights on Hugging Face.

Explore documentation on Mistral’s audio and transcription capabilities.

We’re hiring.

If you're excited about building world-class speech AI and putting frontier models into the hands of developers everywhere, we'd love to hear from you. Apply to join our team.