DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Kimi K2.6 is Moonshot AI’s latest flagship open-source model, released on April 20, 2026 under a Modified MIT license. It is a native multimodal agentic model built on a 1-trillion parameter Mixture-of-Experts (MoE) architecture, with 32 billion parameters activated per token. The model is designed for long-horizon coding, autonomous execution, and multi-agent orchestration, and is available via the DeepInfra API as moonshotai/Kimi-K2.6.
Agent Swarm and Multi-Agent Orchestration
Kimi K2.6 includes an Agent Swarm system that scales to 300 domain-specialized sub-agents, executing up to 4,000 coordinated steps in a single autonomous run — up from 100 sub-agents and 1,500 steps in K2.5. The orchestration layer decomposes complex prompts into parallel subtasks and synthesizes outputs into finished deliverables such as research documents, functional websites, or spreadsheets.
Coding and Full-Stack Development
The model is optimized for software engineering across Rust, Go, and Python, handling tasks from front-end generation to DevOps and performance optimization. Its coding-driven design capability transforms text prompts and visual mockups into production-ready interfaces. Note: image input is not exposed through the API — vision capabilities are used internally by the model’s MoonViT encoder but are not available as a direct API input parameter.
Long-Horizon Autonomous Execution
Kimi K2.6 supports persistent background agent execution, including continuous runs of 12+ hours with thousands of tool calls. It is designed for cross-platform operations and multi-step workflows that extend well beyond standard chat interaction patterns.
Architecture
Benchmark Performance
Kimi K2.6 leads on agentic and coding benchmarks, while trailing on pure math reasoning. All Kimi K2.6 results use thinking mode enabled. Asterisked (*) competitor scores were re-evaluated by Moonshot under the same conditions, as published scores were not available from the original sources.
| Category | Benchmark | Kimi K2.6 | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|---|---|
| Agentic | HLE-Full (w/ tools) | 54.0 | 52.1 | 53.0 | 51.4 |
| DeepSearchQA (Acc) | 83.0 | 63.7 | 80.6 | 60.2 | |
| Coding | SWE-Bench Pro | 58.6 | 57.7 | 53.4 | 54.2 |
| LiveCodeBench v6 | 89.6 | — | 88.8 | 91.7 | |
| Reasoning | AIME 2026 | 96.4 | 99.2 | 96.7* | 98.3* |
| Vision | MathVision (w/ Py) | 93.2 | 96.1* | 84.6* | 95.7* |
Additional results:
Kimi K2.6 is available via DeepInfra’s OpenAI-compatible API.
Authentication
API Endpoint Basics
cURL Example
curl -X POST \
https://api.deepinfra.com/v1/openai/chat/completions \
-H "Authorization: Bearer YOUR_DEEPINFRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "moonshotai/Kimi-K2.6",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the core capabilities of Kimi K2.6?"}
],
"max_tokens": 150,
"temperature": 0.7
}'Python Example
import requests
import json
API_KEY = "YOUR_DEEPINFRA_API_KEY"
API_URL = "https://api.deepinfra.com/v1/openai/chat/completions"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
payload = {
"model": "moonshotai/Kimi-K2.6",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the core capabilities of Kimi K2.6?"}
],
"max_tokens": 150,
"temperature": 0.7
}
try:
response = requests.post(API_URL, headers=headers, data=json.dumps(payload))
response.raise_for_status()
print(json.dumps(response.json(), indent=2))
except requests.exceptions.RequestException as e:
print(f"API request failed: {e}")Common Parameters
| Parameter | Type | Description |
|---|---|---|
| model | string | Required. Use moonshotai/Kimi-K2.6. |
| messages | array | Required. The conversation history (system, user, assistant). |
| max_tokens | integer | Optional. Limits the length of the generated output. |
| temperature | number | Optional. Controls randomness (0.0 to 2.0). |
| stream | boolean | Optional. If true, sends partial deltas as server-sent events. |
Response Format
A successful request returns a JSON object. Key fields: choices[0].message.content contains the generated text; usage contains token counts for billing.
{
"id": "chatcmpl-xxx",
"model": "moonshotai/Kimi-K2.6",
"choices": [
{
"message": {
"role": "assistant",
"content": "Kimi K2.6 is a multimodal agentic model..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 130,
"total_tokens": 155
}
}Kimi K2.6 on DeepInfra uses usage-based pricing calculated per 1 million tokens:
| Token Type | Price per 1M Tokens |
|---|---|
| Input Tokens | $0.75 |
| Output Tokens | $3.50 |
| Cached Input Tokens | $0.15 |
For the most current information on pricing, visit the DeepInfra Pricing Page.
Kimi K2.6 is a capable open-weight agentic model that leads on long-horizon coding and multi-agent orchestration benchmarks while remaining competitive with closed-source frontier models on key software engineering tasks. Its 262K context window, Agent Swarm architecture, and open weights under a Modified MIT license give teams both performance and deployment flexibility — including self-hosting on vLLM or SGLang. The model’s main trade-offs are a smaller context window than some proprietary alternatives (1M vs 262K), no native image input via the API, and trailing scores on pure math reasoning benchmarks relative to GPT-5.4.
To start building with Kimi K2.6, explore the DeepInfra API documentation or visit the Moonshot AI tech blog for the full technical report.
Kimi K2 0905 API from Deepinfra: Practical Speed, Predictable Costs, Built for Devs - Deep Infra<p>Kimi K2 0905 is Moonshot’s long-context Mixture-of-Experts update designed for agentic and coding workflows. With a context window up to ~256K tokens, it can ingest large codebases, multi-file documents, or long conversations and still deliver structured, high-quality outputs. But real-world performance isn’t defined by the model alone—it’s determined by the inference provider that serves it: […]</p>
GLM-5.1 on DeepInfra: Z.AI’s Agentic Engineering Model<p>Z.AI’s GLM-5.1 scores 58.4 on SWE-Bench Pro — ahead of both Claude Opus 4.6 (57.3) and GPT-5.4 (57.7) on real-world software engineering tasks. It’s the direct successor to GLM-5, designed for agentic engineering: long-horizon coding tasks, terminal operations, and repository-level work. The core design premise is that previous models, including GLM-5, tend to plateau after […]</p>
OpenClaw Cost Optimization: Cut AI API Costs by 90%<p>A single ask in an OpenClaw session can cost more than a full evening of casual ChatGPT use. Ask your agent something simple, like which calendar event clashes with your flight, and the request that hits the API carries far more than your 12-token question. It also carries your SOUL.md, the tool schemas registered on […]</p>
© 2026 DeepInfra. All rights reserved.