彩票票据假说:寻找参数减少90%的可训练稀疏神经网络
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks (2018)

原始链接: https://arxiv.org/abs/1803.03635

“彩票票据假说” 提出,大型的随机初始化神经网络包含更小的子网络(“中奖彩票”),这些子网络能够达到与原始网络相当的准确率,但参数数量显著减少。这项研究表明,这些中奖彩票并非*通过*训练产生,而是在密集的网络初始随机权重中*被发现*,并通过剪枝获得。 关键发现是,这些子网络在与原始初始化隔离的情况下训练时,学习速度更快,并且通常比从头开始训练完整网络获得更高的测试准确率。这些“中奖彩票”代表着一个幸运的初始化——一次“彩票大奖”,其中初始权重特别有利于有效的训练。 在 MNIST 和 CIFAR10 数据集上的实验一致地识别出中奖彩票,其规模占原始网络的 10-20%,支持了该假说,并强调了初始权重配置在神经网络训练中的重要性。这表明了一条通往更高效训练和更小、更快速模型的道路。

## 彩票理论:重审 (2018) 一篇2018年的论文提出了“彩票理论”——大型、随机初始化的神经网络包含更小的子网络(“中奖彩票”),能够达到可比的准确率。目前,该理论正在被讨论。然而,原始作者Jon据报道现在对这项工作表示距离,理由是它在商业硬件上不可实用。虽然稀疏子网络可以有效运作的核心观察结果仍然有效,但目前识别这些“中奖彩票”需要先训练大型密集模型,从而抵消了潜在的效率提升。 讨论强调,该理论最初的兴奋源于其对网络*如何*学习的影响,无论其商业可行性如何。早期就有人对硬件限制以及将该概念应用于小模型之外的难度表示担忧。此外,一些人认为该理论仅仅反映了神经网络固有的规范不变性和过度参数化,这意味着许多配置可以逼近期望函数。尽管关注度有所下降,但对稀疏网络和架构决策的研究仍在继续,最近的Nemotron 3等工作表明人们持续关注。
相关文章

原文

View a PDF of the paper titled The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, by Jonathan Frankle and Michael Carbin

View PDF
Abstract:Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.
We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.
We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.
From: Jonathan Frankle [view email]
[v1] Fri, 9 Mar 2018 18:51:28 UTC (6,917 KB)
[v2] Mon, 23 Apr 2018 19:58:09 UTC (8,072 KB)
[v3] Sun, 20 May 2018 19:46:47 UTC (8,173 KB)
[v4] Tue, 27 Nov 2018 20:03:01 UTC (7,782 KB)
[v5] Mon, 4 Mar 2019 15:51:11 UTC (2,029 KB)
联系我们 contact @ memedata.com