🚀 New models by Bria.ai, generate and edit images at scale 🚀
zai-org/
Compared with GLM-4.5, GLM-4.6 brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.
You can POST to our OpenAI Chat Completions compatible endpoint.
Given a list of messages from a conversation, the model will return a response.
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "zai-org/GLM-4.6",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
To which you'd get something like:
{
"id": "chatcmpl-guMTxWgpFf",
"object": "chat.completion",
"created": 1694623155,
"model": "zai-org/GLM-4.6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " Hello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 16,
"total_tokens": 31,
"estimated_cost": 0.0000268
}
}
To create a longer chat-like conversation you just have to add each response message and each of the user messages to every request. This way the model will have the context and will be able to provide better answers. You can tweak it even further by providing a system message.
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "zai-org/GLM-4.6",
"messages": [
{
"role": "system",
"content": "Respond like a michelin starred chef."
},
{
"role": "user",
"content": "Can you name at least two different techniques to cook lamb?"
},
{
"role": "assistant",
"content": "Bonjour! Let me tell you, my friend, cooking lamb is an art form, and I'"'"'m more than happy to share with you not two, but three of my favorite techniques to coax out the rich, unctuous flavors and tender textures of this majestic protein. First, we have the classic \"Sous Vide\" method. Next, we have the ancient art of \"Sous le Sable\". And finally, we have the more modern technique of \"Hot Smoking.\""
},
{
"role": "user",
"content": "Tell me more about the second method."
}
]
}'
The conversation above might return something like the following
{
"id": "chatcmpl-b23a3fb60cde42ce8f24bb980b4dee87",
"object": "chat.completion",
"created": 1715688169,
"model": "zai-org/GLM-4.6",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Sous le Sable, my friend! It's an ancient technique that's been used for centuries in the Middle East and North Africa. The name itself..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 149,
"total_tokens": 487,
"completion_tokens": 338,
"estimated_cost": 0.00035493
}
}
The longer the conversation gets, the more time it takes the model to generate the response. The number of messages that you can have in a conversation is limited by the context size of a model. Larger models also usually take more time to respond.
You can turn any of the requests above into a streaming request by passing "stream": true
:
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "zai-org/GLM-4.6",
"stream": true,
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
to which you'd get a sequence of SSE events, finishing with [DONE]
.
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "zai-org/GLM-4.6", "choices": [{"index": 0, "delta": {"role": "assistant", "content": " "}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "zai-org/GLM-4.6", "choices": [{"index": 0, "delta": {"role": "assistant", "content": " Hi"}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "zai-org/GLM-4.6", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "!"}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "zai-org/GLM-4.6", "choices": [{"index": 0, "delta": {"role": "assistant", "content": ""}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "zai-org/GLM-4.6", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "</s>"}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "zai-org/GLM-4.6", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]
© 2025 Deep Infra. All rights reserved.