GLM-4.7:提升编码能力
GLM-4.7: Advancing the Coding Capability

原始链接: https://z.ai/blog/glm-4.7

## GLM-4.7:代码伙伴的重要升级 GLM-4.7 相较于 GLM-4.6 实现了显著提升,在核心编码、UI质量(“氛围编码”)、工具使用和复杂推理方面表现出色。它在关键基准测试中均有提升——包括 Terminal Bench 2.0 提升 +16.5%,以及在极具挑战性的 HLE 基准测试中提升 +12.4%—— 经常超越 Gemini 3.0 Pro 和 Claude Sonnet 4.5 等模型。 主要特性包括增强的“思考”能力(交错、保留和回合级思考),以实现更稳定和可控的复杂任务,以及改进的网页/幻灯片生成,具有更好的美观性。GLM-4.7 现在可通过 Z.ai API 平台、OpenRouter 以及流行的代码代理(如 Claude Code 和 Kilo Code)获得。 对于现有的 GLM 编码计划订阅者,可自动升级。新用户可以以显著降低的成本访问 Claude 水平的编码模型。模型权重也在 HuggingFace 和 ModelScope 上公开发布,以便本地部署。

## GLM-4.7:一种新型开源权重编码模型 Z.ai 发布了 GLM-4.7,一种针对编码、推理和工具使用进行优化的 358B/32B 激活混合专家 (MoE) 模型。它支持 OpenAI 风格的工具调用,并处理英语和中文。虽然宣称性能可与 Claude 3.5 Sonnet/GPT-5 媲美,但讨论的重点在于本地运行的实用性。 用户们争论硬件要求,一些人发现即使是 Mac Studio Ultra 也因处理速度慢于 token 生成速度而不足。尽管需要大量的 RAM(FP16 格式下 716GB,Q4_K_M 格式下约 220GB),但其吸引力在于潜在的独立性,摆脱大型 LLM 提供商的依赖。 许多评论者强调 Z.ai 的计划与 Claude 等专有模型相比具有成本效益,但承认本地性能目前落后于云端解决方案。对话还涉及模型压缩、速度的重要性以及与 Gemini 3 和 DeepSeek 等其他模型的比较。最终,GLM-4.7 代表着开源权重模型向前迈出的重要一步,挑战着闭源替代方案的统治地位。
相关文章

原文

GLM-4.7, your new coding partner, is coming with the following features:

  • Core Coding: GLM-4.7 brings clear gains, compared to its predecessor GLM-4.6, in multilingual agentic coding and terminal-based tasks, including (73.8%, +5.8%) on SWE-bench, (66.7%, +12.9%) on SWE-bench Multilingual, and (41%, +16.5%) on Terminal Bench 2.0. GLM-4.7 also supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks such as Claude Code, Kilo Code, Cline, and Roo Code.
  • Vibe Coding: GLM-4.7 takes a major step forward in UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing.
  • Tool Using: GLM-4.7 achieves significantly improvements in Tool using. Significant better performances can be seen on benchmarks such as τ^2-Bench and on web browsing via BrowseComp.
  • Complex Reasoning: GLM-4.7 delivers a substantial boost in mathematical and reasoning capabilities, achieving (42.8%, +12.4%) on the HLE (Humanity’s Last Exam) benchmark compared to GLM-4.6.

You can also see significant improvements in many other scenarios such as chat, creative writing, and role-play scenario.

Benchmark Performance. More detailed comparisons of GLM-4.7 with other models GPT-5, GPT-5.1-High, Claude Sonnet 4.5, Gemini 3.0 Pro, DeepSeek-V3.2, Kimi K2 Thinking, on 17 benchmarks (including 8 reasoning, 5 coding, and 3 agents benchmarks) can be seen in the below table.

BenchmarkGLM-4.7GLM-4.6Kimi K2 ThinkingDeepSeek-V3.2Gemini 3.0 ProClaude Sonnet 4.5GPT-5 HighGPT-5.1 High
Reasoning
MMLU-Pro84.383.284.685.090.188.287.587.0
GPQA-Diamond85.781.084.582.491.983.485.788.1
HLE24.817.223.925.137.513.726.325.7
HLE (w/ Tools)42.830.444.940.845.832.035.242.7
AIME 202595.793.994.593.195.087.094.694.0
HMMT Feb. 202597.189.289.492.597.579.288.396.3
HMMT Nov. 202593.587.789.290.293.381.789.2-
IMOAnswerBench82.073.578.678.383.365.876.0-
LiveCodeBench-v684.982.883.183.390.764.087.087.0
Code Agent
SWE-bench Verified73.868.073.473.176.277.274.976.3
SWE-bench Multilingual66.753.861.170.2-68.055.3-
Terminal Bench Hard33.323.630.635.439.033.330.543.0
Terminal Bench 2.041.024.535.746.454.242.835.247.6
General Agent
BrowseComp52.045.1-51.4-24.154.950.8
BrowseComp (w/ Context Manage)67.557.560.267.659.2---
BrowseComp-ZH66.649.562.365.0-42.463.0-
τ²-Bench87.475.274.385.390.787.282.482.7

Coding: AGI is a long journey, and benchmarks are only one way to evaluate performance. While the metrics provide necessary checkpoints, the most important thing is still how it *feels*. True intelligence isn't just about acing a test or processing data faster; ultimately, the success of AGI will be measured by how seamlessly it integrates into our lives-"coding" this time.

Frontend Development Showcases

Scroll down to see more

Artifacts Showcases

Slides Creation Showcases

Interleaved Thinking & Preserved Thinking

GLM-4.7 enhances Interleaved Thinking, a feature introduced since GLM-4.5, and further introduces Preserved Thinking and Turn-level Thinking. By thinking between actions and staying consistent across turns, it makes complex tasks more stable and more controllable:

  • Interleaved Thinking: GLM-4.7 thinks before every response and tool calling, improving instruction following and the quality of generation.
  • Preserved Thinking: In coding agent scenarios, GLM-4.7 automatically retains all thinking blocks across multi-turn conversations, reusing the existing reasoning instead of re-deriving from scratch. This reduces information loss and inconsistencies, and is well-suited for long-horizon, complex tasks.
  • Turn-level Thinking: GLM-4.7 supports per-turn control over reasoning within a session—disable thinking for lightweight requests to reduce latency/cost, enable it for complex tasks to improve accuracy and stability.

More details: https://docs.z.ai/guides/capabilities/thinking-mode

Call GLM-4.7 API via Z.ai API platform

The Z.ai API platform offers the GLM-4.7 model. For comprehensive API documentation and integration guidelines, please refer to https://docs.z.ai/guides/llm/glm-4.7. At the same time, the model is also available worldwide through OpenRouter (https://openrouter.ai/).

Use GLM-4.7 with Coding Agents

GLM-4.7 is now available to use within coding agents (Claude Code, Kilo Code, Roo Code, Cline and more).

For GLM Coding Plan subscribers: You'll be automatically upgraded to GLM-4.7. If you've previously customized the app configs (like ~/.claude/settings.json in Claude Code), simply update the model name to "glm-4.7" to complete the upgrade.

For New users: Subscribing GLM Coding Plan means having access to a Claude-level coding model at a fraction of the cost — just 1/7th the price with 3x the usage quota. Start building today: https://z.ai/subscribe.

Chat with GLM-4.7 on Z.ai

GLM-4.7 is accessible through Z.ai. Try to change the model option to GLM-4.7, if the system does not automatically do that (not like an AGI in that case :))

Serve GLM-4.7 Locally

Model weights for GLM-4.7 are publicly available on HuggingFace and ModelScope. For local deployment, GLM-4.7 supports inference frameworks including vLLM and SGLang. Comprehensive deployment instructions are available in the official GitHub repository.

Footnotes

1: Default settings (most tasks): temperature 1.0, top-p 0.95, max new tokens 131072. For multi-turn agentic tasks (τ²-Bench and Terminal Bench 2), enable Preserved Thinking mode.

2: Terminal Bench and SWE-bench Verified settings: temperature 0.7, top-p 1.0, max new tokens 16384.

3: τ²-Bench settings: temperature 0, max new tokens 16384. For τ²-Bench, we added an extra prompt in the Retail and Telecom interactions to avoid failures caused by users ending the interaction incorrectly; for the Airline domain, we applied the domain fixes proposed in the Claude Opus 4.5 release report.

联系我们 contact @ memedata.com