归一化流是有能力生成的模型
Normalizing Flows Are Capable Generative Models

原始链接: https://machinelearning.apple.com/research/normalizing-flows

本文介绍了一种新颖且高性能的归一化流(NF)架构TarFlow。尽管最近人们关注其他生成模型,但作者证明了NF比以前想象的更强大。TarFlow是一种基于Transformer的掩蔽自回归流(MAF)自适应,它利用堆叠的自回归Transformer块对图像块进行操作,在层之间交替自回归方向以增强表示学习。 作者提出了三种提高样本质量的关键技术:训练过程中的高斯噪声增强、训练后的去噪过程以及类条件和无条件生成的指导方法。这种组合使TarFlow实现了最先进的图像似然估计,大大超越了以前的最佳方法。此外,TarFlow生成的样本质量和多样性可与扩散模型相媲美,这标志着独立NF模型首次达到这种性能水平。该模型是端到端训练的,能够直接建模和生成像素。

苹果公司关于将流标准化为生成模型的研究引发了黑客新闻的讨论。本文强调,可组合归一化流可以与扩散模型进行比较。一个关键的优势是它们的确定性和可逆性,允许在给定相同输入的情况下跨设备输出一致,有利于缓存和用户信任。 评论讨论了本地(设备上)和服务器端人工智能处理之间的权衡。虽然服务器端提供共享资源和升级,但本地处理增强了隐私并允许离线功能,尽管可能以硬件投资为代价。一些人认为,服务器端会导致模型处理效率低下和功耗增加,而客户端AI则受到设备功能的限制。 讨论还涉及在本地运行大型语言模型的计算需求,以及对图形卡要求和功耗的担忧。目前正在探索使用变压器架构等技术将归一化流作为扩散模型的替代方案。
相关文章

原文

Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a Transformer-based variant of Masked Autoregressive Flows (MAFs): it consists of a stack of autoregressive Transformer blocks on image patches, alternating the autoregression direction between layers. TarFlow is straightforward to train end-to-end, and capable of directly modeling and generating pixels. We also propose three key techniques to improve sample quality: Gaussian noise augmentation during training, a post training denoising procedure, and an effective guidance method for both class-conditional and unconditional settings. Putting these together, TarFlow sets new state-of-the-art results on likelihood estimation for images, beating the previous best methods by a large margin, and generates samples with quality and diversity comparable to diffusion models, for the first time with a stand-alone NF model.

Figure 1: Samples at various resolutions generated by TarFlow.

Figure 2: Model architecture of TarFlow.

联系我们 contact @ memedata.com