GLM-5.2 是 Artificial Analysis 榜单上领先的新一代开源权重模型。
GLM-5.2 is the new leading open weights model on Artificial Analysis

原始链接: https://artificialanalysis.ai/articles/glm-5-2-is-the-new-leading-open-weights-model-on-the-artificial-analysis-intelligence-index

Z ai 的 **GLM-5.2** 已成为人工智能分析指数(Artificial Analysis Intelligence Index)中领先的开源权重模型,得分为 51 分。尽管其架构与前代产品(GLM-5.1)保持相同的 744B 总参数量/40B 激活参数量,但其性能仍有显著提升,尤其是在科学推理方面,关键评估指标提升了多达 16 分。 主要亮点包括: * **性能:** 在智能指数(Intelligence Index)和 GDPval-AA v2 评估中,该模型均优于 MiniMax-M3 和 DeepSeek V4 Pro 等竞争对手,并可与 GPT-5.5 等专有模型相媲美。 * **效率:** 虽然单任务成本高于部分同类产品,但 GLM-5.2 在“智能与成本”的帕累托前沿上占据一席之地,以具有竞争力的价格提供了高性能表现。 * **技术升级:** 该模型将上下文窗口从 200K 扩展至 1M Token,并展现出更高的推理密度,单任务输出 Token 达 43k。 * **可用性:** 该模型以 MIT 许可证发布,可通过 Z ai 的 API 及众多第三方提供商获取。 总而言之,GLM-5.2 代表了开源权重模型能力的重大飞跃,在先进的智能体性能、更高的推理准确度以及更低的幻觉率之间实现了平衡。

全新的 GLM-5.2 模型已在 Artificial Analysis 榜单上夺得开源权重模型的榜首。尽管社区公认该模型在性能上有显著提升,且具备极高的原始智能,但对其效率的担忧也随之浮现。 一份用户报告指出,GLM-5.2 在编程任务中表现出过度的“推理开销”。具体而言,该模型仅在开始编写一个相对较小的 Nim 数学库时,就花费了超过 15 分钟并消耗了 45,000 个 Token。相比之下,目前的 GPT-5.5 模型在处理类似任务时,Token 消耗量要低得多。社区反馈建议,虽然 GLM-5.2 已接近前沿水平,但未来的开发应优先考虑优化推理效率并减少不必要的 Token 消耗。
相关文章

原文

Z ai’s GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task

GLM-5.2 is the same size as GLM-5.1 (744B total / 40B active parameters) but scores 11 points higher on the Intelligence Index v4.1, placing ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (max, 44). On the first-party API it is priced in line with GLM-5.1 at $1.4/$4.4/$0.26 per 1M input/output/cache hit tokens

Key results:

GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43)

Improvements across most evaluations, particularly scientific reasoning: GLM-5.2 gains over GLM-5.1 on most evaluations, led by scientific reasoning on CritPt (+16 points to 21%) and HLE (+12 points to 40%), alongside AA-LCR (+9 points to 71%), tau3 banking (+15 points to 27%) and SciCode (+7 points to 50%). TerminalBench v2.1 also improves (+16 points to 78%) and GPQA Diamond gains 3 points to 89%

➤ Leading open weights model on GDPval-AA v2 and competitive with proprietary models: GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning). GDPval-AA v2 builds on the original GDPval-AA by baselining Elo to human performance at 1000, introducing a rotating panel of frontier-model judges, and raising the turn limit from 100 to 250 for longer-horizon agent trajectories

GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k)

On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05)

Additional Model Details:

License: MIT

Size: 744B total parameters, 40B active parameters, equivalent to GLM-5.1

Context window: 1M tokens, up from 200K on GLM-5.1

Pricing: $1.4/$0.26/$4.4 per 1M input/cache hit/output tokens

Availability: Alongside Z ai's first-party API, GLM-5.2 is available across third-party providers including DeepInfra, Novita, Nebius, Parasail, Siliconflow, GMI Cloud, Baseten, and Fireworks

GLM-5.2 leads all open weights models on GDPval-AA v2, our primary metric for real-world agentic performance. At 1524 it places ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328), and is effectively level with GPT-5.5 (xhigh, 1514). We visually inspected GLM-5.2's outputs across a range of GDPval-AA tasks. We have attached a selection below.

GLM-5.2 scores 4 on the AA-Omniscience Index, up from GLM-5.1 (2). The gain comes from both higher accuracy (25.1% vs 24.2%) and a lower hallucination rate (28.1% vs 29.4%), with attempt rate flat at 47%.

GLM-5.2 uses 43k output tokens per Intelligence Index task, of which 37k is reasoning. This is up from GLM-5.1 (26k) and higher than open weights peers MiniMax-M3 (24k) and Kimi K2.6 (35k), placing it among the less token-efficient open weights models at its intelligence level. GLM-5.2 sits off the most attractive quadrant on the Intelligence vs Output Tokens chart.

Breakdown of the individual evaluations in the Artificial Analysis Intelligence Index v4.1.

Compare GLM-5.2 with other leading models at: https://artificialanalysis.ai/models/glm-5-2

联系我们 contact @ memedata.com