GLM-5.1 on DeepInfra: Z.AI’s Agentic Engineering Model

We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Published on 2026.05.25 by DeepInfra

Z.AI’s GLM-5.1 scores 58.4 on SWE-Bench Pro — ahead of both Claude Opus 4.6 (57.3) and GPT-5.4 (57.7) on real-world software engineering tasks. It’s the direct successor to GLM-5, designed for agentic engineering: long-horizon coding tasks, terminal operations, and repository-level work. The core design premise is that previous models, including GLM-5, tend to plateau after their initial gains — GLM-5.1 is built to keep improving across hundreds of rounds and thousands of tool calls.

What makes that architectural choice meaningful in practice is the model’s capacity for iterative strategy revision: breaking down ambiguous problems, running experiments, reading results, and identifying blockers rather than burning through a fixed repertoire early. It carries a 202,752-token context window, supports function calling and JSON natively, and ships under an MIT license — a meaningful detail for teams thinking about deployment flexibility. At $1.05 per million input tokens and $3.50 per million output tokens, it sits at a competitive price point relative to the frontier models it benchmarks against. It’s now available on DeepInfra.

What Makes This Model Different

GLM-5.1 is Z.AI’s successor to GLM-5, built around a specific thesis: most models hit a performance ceiling on long-running agentic tasks and then stall. GLM-5.1 is explicitly designed to keep improving as it’s given more time — sustaining performance across hundreds of rounds and thousands of tool calls rather than exhausting its strategy early.

The clearest evidence shows up in coding and terminal benchmarks, where GLM-5.1 pulls ahead of its predecessor by meaningful margins:

Benchmark	GLM-5.1	GLM-5	Notable Comparisons
SWE-Bench Pro	58.4	55.1	Claude Opus 4.6: 57.3, GPT-5.4: 57.7
NL2Repo	42.7	35.9	Claude Opus 4.6: 49.8, GPT-5.4: 41.3
Terminal-Bench 2.0	63.5	56.2	Claude Opus 4.6: 65.4
CyberGym	68.7	48.3	Claude Opus 4.6: 66.6

On SWE-Bench Pro and NL2Repo, GLM-5.1 lands ahead of both Claude Opus 4.6 and GPT-5.4. CyberGym sees the most dramatic jump: from 48.3 to 68.7, beating Claude Opus 4.6’s 66.6. GLM-5.1 is also available on NVIDIA’s build platform, which gives you another access path if you’re already working within that ecosystem.

On general reasoning, the gains are more modest. GPQA-Diamond moves from 86.0 to 86.2, math benchmarks are roughly flat or slightly down (HMMT Nov: 96.9 → 94.0), and HLE with tools goes from 50.4 to 52.3. The model is tuned for agentic work, not pure reasoning competitions. GLM-5.1 also scores 79.3 on BrowseComp with context management enabled, ahead of DeepSeek-V3.2 (51.4) and competitive with other top-tier models.

The model supports a 202,752-token context window with JSON and function calling — both required for real tool-use pipelines. It handles English and Chinese, is MIT-licensed, and is served in fp4 quantization on DeepInfra under zai-org/GLM-5.1. If you want to understand the broader GLM model lineage, the GLM-4.5 blog post covers the foundation model that preceded this generation.

Getting Started on DeepInfra

GLM-5.1 is available now on DeepInfra under the identifier zai-org/GLM-5.1 as a public endpoint. Pricing is usage-based: $1.05 per 1M input tokens, $3.50 per 1M output tokens, and $0.205 per 1M cached tokens. Private endpoint deployment is also supported if you need dedicated capacity — configure that directly from the DeepInfra dashboard.

DeepInfra gives you access to GLM-5.1 through an OpenAI-compatible API with zero infrastructure setup. DeepInfra operates with a zero-retention policy and is SOC 2 and ISO 27001 certified. If you’re planning to use GLM-5.1 for production coding workflows — Claude Code, Kilo Code, Cline, or similar tools — the GLM Coding Plan is worth reviewing for team-level access options.

To make your first call, grab your API key from the Dashboard and swap in the model identifier:

curl "https://api.deepinfra.com/v1/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
  -d '{
      "model": "zai-org/GLM-5.1",
      "messages": [
        {
          "role": "user",
          "content": "Hello!"
        }
      ]
    }'copy

from openai import OpenAI


client = OpenAI(
    api_key="$DEEPINFRA_TOKEN",
    base_url="https://api.deepinfra.com/v1/openai",
)


response = client.chat.completions.create(
    model="zai-org/GLM-5.1",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)copy

import OpenAI from "openai";


const openai = new OpenAI({
  apiKey: "$DEEPINFRA_TOKEN",
  baseURL: "https://api.deepinfra.com/v1/openai",
});


const response = await openai.chat.completions.create({
  model: "zai-org/GLM-5.1",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);copy

The only things that change from a standard OpenAI call are the base URL (https://api.deepinfra.com/v1/openai), your DeepInfra token, and the model name — the official OpenAI Python and Node.js SDKs work without any modifications. Head to deepinfra.com/zai-org/GLM-5.1 to start building.

Conclusion

GLM-5.1 makes a credible case for itself in the scenarios where agentic models tend to break down — long-running tasks, messy repositories, and multi-step terminal workflows that demand sustained reasoning rather than a single flash of capability. The benchmark numbers against Claude Opus 4.6 and GPT-5.4 aren’t cherry-picked narrow wins; they reflect a model that was deliberately tuned for the kind of work developers actually need to automate.

That opens up real engineering applications: autonomous PR triage pipelines, self-directed debugging agents, or repo-scale refactoring tools that don’t fall apart midway through. If any of that maps to what you’re building, GLM-5.1 is worth running through your eval pipeline. It’s also worth keeping in mind that “agentic model” here means something specific — not just a model with tool access, but one designed around the generalized linear structure of iterative, multi-step problem solving that real engineering tasks actually demand. Head to deepinfra.com/zai-org/GLM-5.1 to get started.

Search That Actually Works: A Guide to LLM RerankersSearch relevance isn’t a nice-to-have feature for your site or app. It can make or break the entire user experience. When a customer searches "best laptop for video editing" and gets results for gaming laptops or budget models, they leave empty-handed. Embeddings help you find similar content, bu...

Build a RAG App With DeepInfra and LangChain<p>Ask a base language model about your company’s refund policy and it will answer with confidence, fluency, and no idea what your policy actually says. The facts live in your PDFs, your internal wiki, and your ticket history, none of which the model has ever seen during training. Retrieval-augmented generation closes that gap by fetching […]</p>

Kimi K2.6 Model Overview: Architecture, Features & Capabilities<p>Kimi K2.6 is Moonshot AI’s latest flagship open-source model, released on April 20, 2026 under a Modified MIT license. It is a native multimodal agentic model built on a 1-trillion parameter Mixture-of-Experts (MoE) architecture, with 32 billion parameters activated per token. The model is designed for long-horizon coding, autonomous execution, and multi-agent orchestration, and is […]</p>

View all