VaultGemma：最强大的差分隐私LLM

VaultGemma：最强大的差分隐私LLM
VaultGemma: The most capable differentially private LLM

原始链接: https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/

我们感谢整个Gemma和Google隐私团队在本项目中的贡献和支持，特别是Peter Kairouz、Brendan McMahan和Dan Ramage对博客文章的反馈，Mark Simborg和Kimberly Schwede在可视化方面的帮助，以及在算法设计、基础设施实施和生产维护方面提供帮助的Google团队。以下人员直接参与了这里呈现的工作（按字母顺序排列）：Borja Balle, Zachary Charles, Christopher A. Choquette-Choo, Lynn Chua, Prem Eruvbetine, Badih Ghazi, Steve He, Yangsibo Huang, Armand Joulin, George Kaissis, Pritish Kamath, Ravi Kumar, Daogao Liu, Ruibo Liu, Pasin Manurangsi, Thomas Mesnard, Andreas Terzis, Tris Warkentin, Da Yu, 和 Chiyuan Zhang。

## VaultGemma：注重隐私的LLM - 摘要谷歌研究最近发布了VaultGemma，一种新的轻量级、开源大型语言模型 (LLM)，它基于**差分隐私 (DP)** 构建。DP 使用“统计魔术”来“模糊”训练数据，使得即使直接提示，模型也很难泄露训练集中的信息。这意味着无论特定数据是否包含在训练中，模型都会提供一致的输出。潜在的应用包括在不冒隐私泄露风险的情况下，使用敏感数据（如医疗记录或用户数据，如Gmail收件箱）进行训练。虽然DP会引入计算开销（需要TPU进行训练），但生成的模型*可以*自行托管并在标准GPU上运行。讨论集中在其减轻版权问题、实现新的数据扩展方法，并最终实现更合乎道德和尊重隐私的AI开发的潜力。然而，一些评论员认为谷歌的主要动机可能是为了在看起来注重隐私的同时，促进广告定向。

We'd like to thank the entire Gemma and Google Privacy teams for their contributions and support throughout this project, in particular, Peter Kairouz, Brendan McMahan and Dan Ramage for feedback on the blog post, Mark Simborg and Kimberly Schwede for help with visualizations, and the teams at Google that helped with algorithm design, infrastructure implementation, and production maintenance. The following people directly contributed to the work presented here (ordered alphabetically): Borja Balle, Zachary Charles, Christopher A. Choquette-Choo, Lynn Chua, Prem Eruvbetine, Badih Ghazi, Steve He, Yangsibo Huang, Armand Joulin, George Kaissis, Pritish Kamath, Ravi Kumar, Daogao Liu, Ruibo Liu, Pasin Manurangsi, Thomas Mesnard, Andreas Terzis, Tris Warkentin, Da Yu, and Chiyuan Zhang.

VaultGemma：最强大的差分隐私LLM VaultGemma: The most capable differentially private LLM

VaultGemma：最强大的差分隐私LLM
VaultGemma: The most capable differentially private LLM