乐杰帕

乐杰帕
LeJEPA

## LeJEPA：一种新的自监督学习方法本文介绍**LeJEPA**，一种建立在联合嵌入预测架构（JEPAs）基础上的新型自监督学习目标。针对现有JEPAs缺乏理论基础和实践指导的问题，LeJEPA提供了一种可扩展且理论合理的替代方案。其核心创新在于确定各向同性高斯分布是嵌入表示的理想选择，并引入**草图各向同性高斯正则化（SIGReg）**来强制执行这一点。这使得训练过程更加简化，只有一个超参数，线性复杂度，并且在各种架构（ResNets、ViTs、ConvNets）和数据集上具有稳定性。值得注意的是，LeJEPA消除了常见的启发式方法，如停止梯度或师生设置，将实现简化至约50行代码，并能够进行高效的分布式训练。在10多个数据集上的经验结果表明，LeJEPA具有强大的性能——使用线性评估在ImageNet-1k上使用ViT-H/14达到79%的准确率——并突出了LeJEPA有潜力重振自监督预训练作为人工智能研究的一个基本领域。代码可在GitHub上获取。

最近的 Hacker News 讨论集中在 Yann LeCun 的联合嵌入预测架构 (JEPA) 以及他对自回归大型语言模型 (LLM) 的批评。一位用户分享了一个 LeCun 演讲的链接，解释了 JEPA 背后的概念模型，他将 JEPA 定位为 LLM 主流方法的替代方案。然而，评论者表达了怀疑。有人认为 LeCun 经常使用稻草人论证来反对 LLM，声称它们缺乏世界建模或规划能力——许多人认为这些问题可以通过当前技术解决。另一人指出，即使在 ImageNet-1k 等数据集上进行预训练，LeCun 也没有展示 JEPA 的可扩展性以与 LLM 竞争。最后，一位用户分享了个人实验结果，表明 JEPA 目前没有优于传统的机器学习目标。此次对话凸显了关于人工智能架构未来以及实现通用人工智能的不同方法的优劣之争。

[Submitted on 11 Nov 2025 (v1), last revised 14 Nov 2025 (this version, v3)]

View a PDF of the paper titled LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics, by Randall Balestriero and 1 other authors

View PDF HTML (experimental)

Abstract:Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{this https URL}{GitHub repo}).

From: Randall Balestriero [view email]
[v1] Tue, 11 Nov 2025 18:21:55 UTC (12,072 KB)
[v2] Wed, 12 Nov 2025 14:26:39 UTC (12,072 KB)
[v3] Fri, 14 Nov 2025 08:38:32 UTC (12,072 KB)

乐杰帕 LeJEPA

乐杰帕
LeJEPA