MiniMax M2.7现已开源

MiniMax M2.7现已开源
MiniMax M2.7 Is Now Open Source

原始链接: https://firethering.com/minimax-m2-7-agentic-model/

## MiniMax M2.7：一款自我改进的AI模型 MiniMax发布了M2.7，一款以其**自我进化能力**而著称的新AI模型。与传统的AI开发不同，M2.7被赋予了一个编程框架，并被允许独立分析失败、修改代码和在多轮迭代中优化性能——在没有人工干预的情况下，实现了**30%的改进**。 M2.7在各种基准测试中表现出色，在**软件工程（SWE-Pro：56.22%）**方面与GPT-5.3相媲美，在**完整项目交付（VIBE-Pro：55.6%）**方面接近Opus 4.6。它在**办公生产力（GDPval-AA ELO：1495）**方面表现突出，能够高质量地处理Word、Excel和PowerPoint中的文档编辑。值得注意的是，它在**MLE Bench Lite上获得了66.6%的奖牌率**，并在长时间的测试中持续改进，展示了对于代理应用至关重要的长期行为。该模型在**HuggingFace**上可用，提供可下载的权重并通过**免费NVIDIA API**访问。然而，其许可限制了**商业用途**，需要事先获得MiniMax的书面授权和显著的署名。尽管如此，M2.7代表着朝着更自主和强大的AI迈出的重要一步，为开发者和研究人员提供了一个强大的工具。

## MiniMax M2.7：开放权重，非开源 MiniMax M2.7，一种新的AI模型，已经发布了其权重，但**并非真正意义上的开源**。其许可允许非商业用途，采用类似MIT的条款，但商业应用需要事先授权，并包含额外的限制。许多评论员强调了这种区别，称之为“开放权重”而非“开源”。用户在使用MiniMax进行编程方面的体验褒贬不一。一些人认为它需要大量的提示和“提醒”才能完成任务，不如Claude等模型。另一些人则报告在使用M2.5时取得了成功，并预计通过精心提问，M2.7的表现会更好。感谢Unsloth，GGUF版本现已可用。 Nvidia 正在提供免费的API来测试该模型，但一些用户在帐户验证方面遇到了问题。讨论还涉及通过开放权重托管降低成本的潜力，以及对数据主权的好处。 Hacker News上已经存在关于同一主题的先前讨论。

原文

- Advertisement -

MiniMax handed an internal version of M2.7 a programming scaffold and let it run unsupervised. Over 100 rounds it analyzed its own failures, modified its own code, ran evaluations, and decided what to keep and what to revert. The result was a 30% performance improvement with nobody directing each step. That is not a benchmark result. That is a different way of thinking about how AI models get built.

M2.7 is now available on HuggingFace with weights you can download and deploy. NVIDIA is offering free API access if you want to try it without the hardware overhead. The license has a commercial limitation worth knowing about, we will get to that.

What self evolution actually means here

MiniMax used M2.7 during its own development to update memory, build skills for reinforcement learning experiments, and improve its own learning process based on experiment results. The model was a participant in its own training pipeline.

The clearest demonstration is the MLE Bench Lite result. MiniMax gave M2.7 access to 22 machine learning competitions, each runnable on a single A30 GPU, and let it run three 24 hour trials with a simple harness built around short term memory, self feedback, and self optimization. After each round the model generated a memory file, criticized its own results, and fed those observations into the next round.

The best run achieved 9 gold medals, 5 silver medals, and 1 bronze across those 22 competitions. The average medal rate across all three trials was 66.6%, second only to Opus 4.6 at 75.7% and GPT-5.4 at 71.2%.

What makes this interesting is not the medal count. It is that the improvement was continuous across all three 24 hour windows. The model kept finding better approaches the longer it ran, which connects directly to the long horizon behavior that makes agentic models actually useful in production.

What M2.7 can do

The benchmark that matters most for developers is SWE-Pro, which tests real software engineering across multiple programming languages. M2.7 scores 56.22%, matching GPT-5.3-Codex. On SWE Multilingual it scores 76.5 and on Multi SWE Bench 52.7, both of which test closer to real world engineering scenarios.

It can correlate monitoring metrics with deployment timelines, run statistical analysis on trace data, connect to databases to verify root causes, and make SRE level decisions about how to stop the bleeding before submitting a fix. MiniMax claims it has reduced live production incident recovery time to under three minutes on multiple occasions using M2.7.

On VIBE-Pro, which tests end to end full project delivery across web, Android, iOS, and simulation tasks, M2.7 scores 55.6%, close to Opus 4.6. That means you can hand it a complete project requirement and expect something usable back.

Native Agent Teams support is the other practical capability. The model can maintain stable role identity across multi-agent setups, make autonomous decisions within complex state machines, and challenge other agents on logical gaps. That is not prompt engineering, it is internalized behavior.

The office and productivity angle

Software engineering gets most of the attention with agentic models but M2.7 has a serious productivity story. On GDPval-AA, which measures professional task delivery across real office scenarios, M2.7 scores an ELO of 1495. That is the highest among open source models and sits above GPT-5.3, though Opus 4.6, Sonnet 4.6, and GPT-5.4 still lead it.

The practical capability is in document work. M2.7 handles Word, Excel, and PPT with multi-round high fidelity editing, meaning you can give it an existing file, ask for revisions across multiple interactions, and get back something editable. MiniMax demonstrated this with a TSMC financial analysis task where the model read annual reports, cross referenced research, built a revenue forecast model, and produced a finished PPT and Word report. Their own finance practitioners called the output usable as a first draft.

On Toolathon it scores 46.3%, which puts it in the global top tier for tool use accuracy. It maintains 97% skill compliance across 40 complex skills on MM Claw, each skill exceeding 2,000 tokens. That last number matters for anyone building agent workflows with large complex skill libraries.

Related: GLM 5.1: The open source model that gets better the longer you run it

License: what open source actually means here

This is the part to read carefully before building anything on M2.7.

The license looks MIT at first glance but it is not MIT. Non commercial use is free with no restrictions. Commercial use requires prior written authorization from MiniMax. You need to contact [email protected] and get approval before shipping any product that uses M2.7 or charges users for access to it.

There is also a display requirement. Any commercial use must prominently show “Built with MiniMax M2.7” on a related website, interface, or documentation.

For researchers, students, hobbyists, and anyone experimenting locally, none of this affects you. For developers building commercial products, get in touch with MiniMax before you ship. The weights are available and the model is genuinely capable, just go in with clear eyes about what the license actually permits.

How to try it today

The fastest way is NVIDIA’s free API access. No local setup or hardware requirements, just an API key and you are talking to M2.7 immediately. If you want to evaluate it before committing to anything, start here.

For local deployment the weights are on HuggingFace. SGLang is the recommended inference framework, with vLLM and Transformers also supported. Be honest with yourself about the hardware requirements before going this route, this is a large model and local deployment needs serious infrastructure.

MiniMax Agent at agent.minimax.io gives you a hosted interface if you want to test the agentic capabilities without any setup at all. The API platform at platform.minimax.io is the developer path for anyone building on top of it within the license terms.

Top tier AI in your hands

M2.7 is one of the more capable agentic models available with public weights right now. The self evolution story is not just interesting backstory, it shows up in the benchmark results and in the kind of sustained improvement over long running tasks that most models cannot maintain.

The software engineering numbers are competitive with the best closed models. The office productivity angle is genuinely useful for teams doing real document work. The 66.6% medal rate on MLE Bench Lite, achieved autonomously over 24 hour windows, tells you something real about how this model behaves when you give it a hard problem and step back.

The ceiling on what you can do with M2.7 is genuinely high. The question is whether your use case fits within the terms.