米斯特拉尔尼莫
Mistral NeMo

原始链接: https://mistral.ai/news/mistral-nemo/

我们推出了 Mistral NeMo,这是与 NVIDIA 合作开发的强大 AI 模型。 它拥有 128k 个标记的巨大上下文窗口,拥有同类产品中最先进的推理、世界知识和编码精度。 由于它采用通用架构,用户会发现利用 Mistral 7B 在现有系统中实现起来非常方便、无缝。 为了促进研究人员和企业的广泛使用,我们在 Apache 2.0 许可证下提供预训练的基础和调整的检查点版本。 通过量化感知进行训练,这使得 FP8 推理成为可能,而不会降低性能。 表 1 显示了 Mistral NeMo Base、Gemma 2 9B 和 Llama 3 8B 之间的精度比较。 该模型针对英语、法语、德语等多种语言,让全球使用这些主要文化语言的人们都能使用该模型(见图 1)。 为了提高效率,它使用了一种名为 Tekken 的新型分词器,在压缩 Python 或 SQL 等源代码时,其性能比早期的 Mistral 模型高出 30%(图 2)。 此外,由于指令微调和对齐改进,与前几代产品相比,它在指令依从性、推理能力、多轮对话管理和编码生成方面表现出色(表 2)。 基础模型和调整模型的权重都可以通过 HuggingFace 获取; 您今天就可以开始尝试 Mistral NeMo! (\*参见图 1、2 和表 2)

新的 Mistral NeMo 由 Mistral AI 与 NVIDIA 合作开发,是一种功能强大的语言模型,上下文窗口最多可容纳 128K 个令牌。 与同尺寸范围内的模型相比,它提供了改进的推理、世界知识和编码准确性。 Mistral NeMo 采用通用架构,使其用户友好并与使用 Mistral 7B 的系统兼容。 预训练的基础和指令调整的检查点可在 Apache 2.0 许可证下用于研究和企业目的。 该模型旨在在单个 NVIDIA L40S 或 GeForce RTX 4090 或 RTX 4500 等同等硬件上高效运行。它速度更快,并提供增强的安全和隐私功能。 该模型在性能、许可和能源要求方面超出了预期,同时由于其 8 位量化感知而需要更少的计算能力。 尽管像 70B 这样的较大模型可能需要降低精度才能适应消费级 GPU,但 Mistral NeMo 提供了与较小模型相当的令人印象深刻的结果,同时还拥有更大​​的上下文窗口。
相关文章

原文

Today, we are excited to release Mistral NeMo, a 12B model built in collaboration with NVIDIA. Mistral NeMo offers a large context window of up to 128k tokens. Its reasoning, world knowledge, and coding accuracy are state-of-the-art in its size category. As it relies on standard architecture, Mistral NeMo is easy to use and a drop-in replacement in any system using Mistral 7B.

We have released pre-trained base and instruction-tuned checkpoints checkpoints under the Apache 2.0 license to promote adoption for researchers and enterprises. Mistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.

The following table compares the accuracy of the Mistral NeMo base model with two recent open-source pre-trained models, Gemma 2 9B, and Llama 3 8B.

Table 1: Mistral NeMo base model performance compared to Gemma 2 9B and Llama 3 8B.

Multilingual Model for the Masses

The model is designed for global, multilingual applications. It is trained on function calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. This is a new step toward bringing frontier AI models to everyone’s hands in all languages that form human culture.

Figure 1: Mistral NeMo performance on multilingual benchmarks.

Tekken, a more efficient tokenizer

Mistral NeMo uses a new tokenizer, Tekken, based on Tiktoken, that was trained on over more than 100 languages, and compresses natural language text and source code more efficiently than the SentencePiece tokenizer used in previous Mistral models. In particular, it is ~30% more efficient at compressing source code, Chinese, Italian, French, German, Spanish, and Russian. It is also 2x and 3x more efficient at compressing Korean and Arabic, respectively. Compared to the Llama 3 tokenizer, Tekken proved to be more proficient in compressing text for approximately 85% of all languages.

Figure 2: Tekken compression rate.

Instruction fine-tuning

Mistral NeMO underwent an advanced fine-tuning and alignment phase. Compared to Mistral 7B, it is much better at following precise instructions, reasoning, handling multi-turn conversations, and generating code.

Table 2: Mistral NeMo instruction-tuned model accuracy. Evals done with GPT4o as judge on official references.

Weights are hosted on HuggingFace both for the base and for the instruct models. You can try Mistral NeMo now with mistral-inference and adapt it with mistral-finetune. Mistral NeMo is exposed on la Plateforme under the name open-mistral-nemo-2407. This model is also packaged in a container as NVIDIA NIM inference microservice and available from ai.nvidia.com.

联系我们 contact @ memedata.com