黑盒大语言模型的知识蒸馏

黑盒大语言模型的知识蒸馏
Knowledge Distillation of Black-Box Large Language Models (2024)

研究论文《黑盒大语言模型的知识蒸馏》介绍了一种名为 **Proxy-KD** 的新方法，旨在改进从专有黑盒大语言模型（LLM）向更小、更高效的模型进行知识迁移的过程。尽管从 GPT-4 等强大模型中进行知识蒸馏（KD）备受关注，但无法访问教师模型的内部状态通常会限制性能。Proxy-KD 通过引入代理模型来弥合这一差距，从而促进更有效的知识提取。实验结果表明，该方法不仅改进了现有的黑盒蒸馏技术，而且表现优于传统的白盒知识蒸馏方法。通过绕过教师模型架构不可见的限制，Proxy-KD 为将先进闭源大语言模型的能力蒸馏到紧凑、实用的模型中提供了一个极具前景且可扩展的框架。

Hacker News 新闻 | 过往 | 评论 | 提问 | 展示 | 招聘 | 提交登录黑盒大语言模型的知识蒸馏 (arxiv.org) 8 分 | babelfish 发布于 28 分钟前 | 隐藏 | 过往 | 收藏 | 2 条评论 | 帮助 duendefm 7 分钟前 | 下一条 [–] 中国人在彻底摧毁美国 AI 经济泡沫方面确实做得非常强势。老实说，尽管我完全支持美国并反对中国，但我认为我们应该帮助他们戳破美国的 AI 泡沫。他们掌控了一切，而我们现在甚至买不到一台新电脑，却从中得不到任何好处。我希望一些有影响力的程序员能鼓励各地的开发者大规模停止订阅 Claude 和 ChatGPT，转而使用中国的服务。我相信，只要我们程序员团结起来，就能促成这个泡沫的破裂。回复 linolevan 2 分钟前 | 上一条 [–] 我们能不能在标题里注明这是一篇 2024 年的论文？回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

[Submitted on 13 Jan 2024 (v1), last revised 9 Nov 2024 (this version, v2)]

View a PDF of the paper titled Knowledge Distillation of Black-Box Large Language Models, by Hongzhan Chen and 5 other authors

View PDF HTML (experimental)

Abstract:Given the exceptional performance of proprietary large language models (LLMs) like GPT-4, recent research has increasingly focused on boosting the capabilities of smaller models through knowledge distillation (KD) from these powerful yet black-box teachers. While leveraging the high-quality outputs of these teachers is advantageous, the inaccessibility of their internal states often limits effective knowledge transfer. To overcome this limitation, we introduce Proxy-KD, a novel method that uses a proxy model to facilitate the efficient transfer of knowledge from black-box LLMs to smaller models. Our experiments show that Proxy-KD not only enhances the performance of KD from black-box teacher models but also surpasses traditional white-box KD techniques.~This approach presents a compelling new avenue for distilling knowledge from advanced LLMs.

From: Hongzhan Chen [view email]
[v1] Sat, 13 Jan 2024 08:43:32 UTC (359 KB)
[v2] Sat, 9 Nov 2024 01:35:32 UTC (8,288 KB)

黑盒大语言模型的知识蒸馏 Knowledge Distillation of Black-Box Large Language Models (2024)

黑盒大语言模型的知识蒸馏
Knowledge Distillation of Black-Box Large Language Models (2024)