We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

zai-org logo

zai-org/

GLM-5.1

$1.40

in

$4.40

out

$0.26

cached

/ 1M tokens

GLM-5.1 is Z-AI's next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

Deploy Private Endpoint
Public
fp8
202,752
JSON
Function
zai-org/GLM-5.1 cover image
zai-org/GLM-5.1 cover image
GLM-5.1

Ask me anything

0.00s

Settings

Model Information

language:

  • en
  • zh library_name: transformers license: mit pipeline_tag: text-generation

GLM-5.1

👋 Join our WeChat or Discord community.
📖 Check out the GLM-5.1 blog and GLM-5 Technical report.
📍 Use GLM-5.1 API services on Z.ai API Platform.
🔜 GLM-5.1 will be available on chat.z.ai in the coming days.

[Paper] [GitHub]

Introduction

GLM-5.1 is our next-generation flagship model for agentic engineering, with significantly stronger coding capabilities than its predecessor. It achieves state-of-the-art performance on SWE-Bench Pro and leads GLM-5 by a wide margin on NL2Repo (repo generation) and Terminal-Bench 2.0 (real-world terminal tasks).

bench_51

But the most meaningful leap goes beyond first-pass performance. Previous models—including GLM-5—tend to exhaust their repertoire early: they apply familiar techniques for quick initial gains, then plateau. Giving them more time doesn't help.

GLM-5.1, by contrast, is built to stay effective on agentic tasks over much longer horizons. We've found that the model handles ambiguous problems with better judgment and stays productive over longer sessions. It breaks complex problems down, runs experiments, reads results, and identifies blockers with real precision. By revisiting its reasoning and revising its strategy through repeated iteration, GLM-5.1 sustains optimization over hundreds of rounds and thousands of tool calls. The longer it runs, the better the result.

Benchmark

GLM-5.1GLM-5Qwen3.6-PlusMinimax M2.7DeepSeek-V3.2Kimi K2.5Claude Opus 4.6Gemini 3.1 ProGPT-5.4
HLE31.030.528.828.025.131.536.745.039.8
HLE (w/ Tools)52.350.450.6-40.851.853.1*51.4*52.1*
AIME 202695.395.495.189.895.194.595.698.298.7
HMMT Nov. 202594.096.994.681.090.291.196.394.895.8
HMMT Feb. 202682.682.887.872.779.981.384.387.391.8
IMOAnswerBench83.882.583.866.378.381.875.381.091.4
GPQA-Diamond86.286.090.487.082.487.691.394.392.0
SWE-Bench Pro58.455.156.656.2-53.857.354.257.7
NL2Repo42.735.937.939.8-32.049.833.441.3
Terminal-Bench 2.0 (Terminus-2)63.556.261.6-39.350.865.468.5-
Terminal-Bench 2.0 (Best self-reported)66.5 (Claude Code)56.2 (Claude Code)-57.0 (Claude Code)46.4 (Claude Code)---75.1 (Codex)
CyberGym68.748.3--17.341.366.6--
BrowseComp68.062.0--51.460.6---
BrowseComp (w/ Context Manage)79.375.9--67.674.984.085.982.7
τ³-Bench70.669.270.767.669.266.072.467.172.9
MCP-Atlas (Public Set)71.869.274.148.862.263.873.869.267.2
Tool-Decathlon40.738.039.846.335.227.847.248.854.6
Vending Bench 2$5,634.00$4,432.12$5,114.87-$1,034.00$1,198.46$8,017.59$911.21$6,144.18