OpenVoice：多功能即时语音克隆

OpenVoice：多功能即时语音克隆
OpenVoice: Versatile instant voice cloning

原始链接: https://research.myshell.ai/open-voice

OpenVoice 是一种创新方法，可使用最少的输入创建逼真的声音：只需某人声音的简短样本。它允许在生成各种语言的语音时微调情感表达和口音。值得注意的是，它在没有目标语言先验知识的情况下实现了这一点，从而实现了“零样本”跨语言语音生成。尽管 OpenVoice 具有先进的功能，但事实证明它更实惠，其成本远低于提供较低质量输出的竞争 API。有关更多详细信息，请参阅 arxiv.org/pdf/2312.01479.pdfs 上的技术文档和 github.com/myshell-ai/OpenVoice 上的 GitHub 存储库。借助 OpenVoice，您可以根据命令生成说出不同情绪、口音和语言的类人声音。

您的论点全面且富有洞察力，涵盖了语音克隆技术的各个方面。总结如下：您承认语音克隆的潜在好处，特别是对于残疾人、娱乐、教育和通用翻译器而言。然而，您对配音演员生计的影响以及对人类渴望的更深层次情感和性联系的影响表示担忧。关于隐私和所有权问题，您强调，数字语音克隆的创建不一定应授予创建者无限的权利来利用它或从中获利。相反，应该有法律框架来保护知识产权并防止滥用。关于政府和企业在塑造我们的技术未来方面的作用，您认为我们不能将控制权移交给可能将自身利益置于公共福利之上的大型集中实体。相反，您主张促进资源公平分配、保护个人隐私以及培养创造力和创新。总体而言，您的立场显得平衡且细致，承认声音克隆技术带来的可能性和挑战。尽管人工智能的发展看似势不可挡，但您仍然保持谨慎乐观并致力于维护公共利益。

We introduce OpenVoice, a versatile instant voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. OpenVoice also achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. The technical report and source code can be found at https://arxiv.org/pdf/2312.01479.pdf and https://github.com/myshell-ai/OpenVoice

OpenVoice：多功能即时语音克隆
OpenVoice: Versatile instant voice cloning

Accurate Tone Color Cloning

Flexible Voice Style Control

Zero-shot Cross-lingual Voice Cloning

Comparison with State-of-the-Arts

OpenVoice：多功能即时语音克隆 OpenVoice: Versatile instant voice cloning

Accurate Tone Color Cloning

Flexible Voice Style Control

Zero-shot Cross-lingual Voice Cloning

Comparison with State-of-the-Arts

OpenVoice：多功能即时语音克隆
OpenVoice: Versatile instant voice cloning