MAI-思考-1

MAI-思考-1
MAI-Thinking-1

原始链接: https://microsoft.ai/news/introducing-mai-thinking-1/

微软推出了 **MAI-Thinking-1**，这是一款强大的中型（350亿活跃参数，总参数约1万亿）推理模型。该模型专为企业和软件工程应用而设计，在编程和数学领域表现卓越——在 2025 年 AIME 考试中取得了 97.0% 的成绩，并在盲测中优于 Sonnet 4.6 等竞争对手。至关重要的是，MAI-Thinking-1 代表了对当前行业趋势的背离。它是从零开始构建的，没有使用第三方蒸馏技术，确保其能力是真正习得的，而非继承而来。该模型是微软全新“爬山机”（Hill-Climbing Machine）这一专有端到端开发流水线的旗舰产品。该系统优先考虑三大核心支柱： 1. **自给自足：** 依赖内部基础设施和加速器。 2. **纯净数据：** 仅使用高质量、商业授权的数据，明确排除人工智能生成的内容，以保持数据来源和控制。 3. **习得能力：** 通过严格的确定性训练环境强制模型掌握任务，而不是模仿其他模型。通过优化更小、更高效的占用空间，微软旨在提供先进的代理智能，使其能够应用于日常开发者工作流程中，这标志着迈向其“人文主义超级智能”（Humanist Superintelligence）目标的重要一步——即旨在增强而非取代人类生产力的人工智能。

微软推出了由其内部 MAI 部门开发的新型人工智能模型“MAI-Thinking-1”。该模型的一个关键区别在于其训练方法：微软强调，该模型完全使用纯净的、经过商业许可的企业级数据从零构建，刻意排除了合成数据或第三方提炼的内容。这一举措表明了其在数据来源和模型透明度方面的战略重点。在 Hacker News 上，这一公告引发了关于这种“纯净数据”方法对人工智能缩放定律（Scaling Laws）潜在影响的讨论。一些用户质疑所谓的“纯净”训练集在多大程度上仍依赖于底层的合成数据。虽然一些评论者认为这与 OpenAI 目前的发展轨迹存在显著分歧，但另一些人则强调了技术上的不满，例如网站界面设计（UI/UX）较差。总体而言，该模型的发布作为一个重要的新兴力量，正在受到人工智能领域持续演进的密切关注。

原文

Today we are introducing MAI-Thinking-1, Microsoft AI’s reasoning model. It is a medium-sized model that stands among the strongest models in its weight class. It matches leading models on key software engineering benchmarks, demonstrates advanced mathematical reasoning capabilities, and is preferred to Sonnet 4.6 in our blind human side-by-side evaluations. We trained it from the ground up on enterprise grade, clean and commercially licensed data, without distillation from third-party models.

MAI-Thinking-1 is a step in our broader work to build towards Humanist Superintelligence: advanced AI capabilities designed to serve people and organizations, not to replace them. The model matters on both axes: what it can do, and how it was built.

The Hill-Climbing Machine

More than a single model, we are excited to introduce our Hill-Climbing Machine: a co-designed pipeline built to make every component of model development climbable, so capabilities improve continually and reliably over time. The aim is a repeatable system that can absorb better data, stronger rewards, more capable environments, and more compute.

Three main pillars guide our philosophy.

First, capabilities should be learned, not inherited. Although faster to acquire, inherited intelligence lacks the steerability essential for real world usage: an imitator is fundamentally tied to the design choices of its teacher and struggles to adapt to new situations. MAI-Thinking-1 was trained without distillation from third party models, forcing our model to truly learn the tasks at hand.

Second, clean data. MAI-Thinking-1 was trained on clean and appropriately licensed data, with AI-generated content excluded from pre-training. This matters for quality, provenance, and control. If we cannot account for what shaped a model, we cannot fully understand its behavior or credibly improve it.

Third, self-sufficiency across the entire stack. All the way from co-design of our models with MSFT’s own accelerators through to our reinforcement learning framework, we have focused efforts on in-house training infrastructure. This is a crucial part of building our hill-climbing machine, to ensure we can fully optimize and shape our systems end-to-end to best serve our needs.

Medium-sized model, with strong software engineering performance

MAI-Thinking-1 is a 35B-active, ~1T-total parameters, sparse Mixture of Experts model, a smaller inference footprint than much larger models. Despite this, our model is toe-to-toe with Claude Opus 4.6 on SWE-Bench Pro. That matters for developers and enterprises because model size determines where advanced coding assistance can be deployed, how often it can be used, and whether it can move from exceptional tasks into daily workflows.

We have invested heavily in the training environments needed for agentic coding. Each verified environment is deterministic, executable, and graded by real test suites. This gives the model practice on the kind of multi-step work developers actually do: reading code, editing files, running tests, observing failures, and recovering from intermediate mistakes.

Advanced mathematical reasoning capabilities

MAI-Thinking-1 reaches 97.0% on AIME 2025, and 94.5% on AIME 2026, showing strong mathematical and scientific reasoning for its weight class. Strong performance here gives us confidence that our training loop can create real reasoning gains – climbing all the way from the ground up – from our own data, rewards, and evaluation process, enabling this intelligence to generalize to other domains over time.

MAI-思考-1 MAI-Thinking-1

The Hill-Climbing Machine

Medium-sized model, with strong software engineering performance

Advanced mathematical reasoning capabilities

MAI-思考-1
MAI-Thinking-1