Jamba:基于 Mamba 的生产级 AI 模型
Jamba: Production-grade Mamba-based AI model

原始链接: https://www.maginative.com/article/ai21-labs-unveils-jamba-the-first-production-grade-mamba-based-ai-model/

AI21 Labs 推出了 Jamba,这是一种突破性的 AI 模型,采用结合了 Transformer 和 Mamba 结构的新型混合架构。 Jamba 拥有 256K 个令牌的巨大上下文窗口,相当于大约 210 个打印页面,可容纳大量输入,超过了 Llama 等竞争对手的 32,000 个令牌限制。 尽管容量有所扩展,但 Jamba 仍可容纳在单个 80GB GPU 中,从而实现了令人印象深刻的效率提升。 Transformer 和 Mamba 架构的结合显着提升了扩展输入序列的性能,相对于相同规模的基于 Transformer 的模型,吞吐量提高了三倍。 Jamba 根据 Apache 2.0 许可证提供,可通过 Hugging Face 上的 Open Weights 和 NVIDIA 的 API 目录(作为 NVIDIA NIM 推理微服务的一部分)进行访问。 目前,该模型主要用作研究材料; 然而,AI21 实验室预计很快就会推出一种增强的、商业上可行的变体。 期待能力、效率和可承受性方面的进步,这标志着卓越人工智能模型的黎明。

本文讨论了某些语言模型(特别是 SSM 和 Mamba)在内存使用和详细信息保留方面的局限性。 作者提到在使用大型 GPU 的 Linux 中进行长上下文处理时遇到问题。 他们对利用 Mamba 的生产级模型表示兴奋,但也注意到与其他模型相比其性能和吞吐量的担忧。 文本建议进行修改以引发更长的响应,包括请求特定长度并将提示分成更小的部分。 作者对简洁的回答感到沮丧,并在要求延长篇幅时取得了成功。 他们提到了训练期间的大量内存需求以及压缩技术对于管理这些需求的重要性。 文本最后反思了人工智能模型与人脑之间的关系,并期待未来见证更多的公共培训故事。 此外,它还探索了 Mamba 的独特功能,例如携带先前步骤的状态,以及减少传统变压器层以在较短距离内保持精度。 然而,Mamba 仍然难以有效地处理长上下文,并且随着时间的推移会丢失细节,最终导致复杂任务的性能不佳。 尽管存在这些挑战,大型语言模型领域似乎仍在不断发展和进步,特别是在探索新颖的架构方面。
相关文章

原文

AI21 Labs, has just released Jamba, the world's first production-grade AI model based on the innovative Mamba architecture. Most models today (like GPT, Gemini and Llama) are based on the Transformer architecture. Jamba combines the strengths of both the Mamba Structured State Space model (SSM) and the traditional Transformer architecture, delivering impressive performance and efficiency gains.

Jamba boasts an extensive context window of 256K tokens, equivalent to around 210 pages of text, while fitting up to 140K tokens on a single 80GB GPU. This remarkable feat is achieved through its hybrid SSM-Transformer architecture, which leverages mixture-of-experts (MoE) layers to draw on just 12B of its available 52B parameters during inference. The result is a model that can handle significantly longer contexts than most of its counterparts, such as Meta's Llama 2 with its 32,000-token context window, while maintaining high throughput and efficiency.

Jamba delivers 3x throughput on long contexts, making it a more efficient model than Transformer-based models of comparable size like Mixtral 8x7B.

One of the key advantages of Jamba is its ability to deliver 3x throughput on long contexts compared to Transformer-based models of similar size, like Mixtral 8x7B. This is made possible by the model's unique hybrid architecture, which is composed of Transformer, Mamba, and mixture-of-experts (MoE) layers, optimizing for memory, throughput, and performance simultaneously.

It features a blocks-and-layers approach, with each Jamba block containing either an attention or a Mamba layer, followed by a multi-layer perceptron (MLP). This results in an overall ratio of one Transformer layer out of every eight total layers. AI21 Labs says this approach allows the model to maximize quality and throughput on a single GPU, leaving ample memory for common inference workloads.

Jamba's impressive performance extends beyond efficiency and cost-effectiveness. The model has already demonstrated remarkable results on various benchmarks, matching or outperforming state-of-the-art models in its size class across a wide range of tasks.

Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.

Jamba is being released with open weights under Apache 2.0 license. It is available on Hugging Face, and will also be accessible from the NVIDIA API catalog as NVIDIA NIM inference microservice, which enterprise applications developers can deploy with the NVIDIA AI Enterprise software platform.

For now, Jamba is currently released as a research model without the necessary safeguards for commercial use. However, AI21 Labs plans to release a fine-tuned, safer version in the coming weeks. As the AI community continues to explore and refine new architectures, we can expect to see even more impressive gains in performance, efficiency, and accessibility, paving the way for a new generation of more capable AI models.

联系我们 contact @ memedata.com