DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

MiMo-V2.5 is a native omnimodal model developed by XiaomiMiMo, designed to process and understand text, image, video, and audio through a unified architecture rather than relying on “bolted-on” components for each modality.
Built on a 310-billion-parameter Sparse Mixture of Experts (MoE) architecture — with only 15 billion parameters activated during inference — MiMo-V2.5 offers a strong balance of high-tier reasoning and computational efficiency. With a 1-million-token context window and agentic capabilities, it is engineered for complex multimodal perception, long-context reasoning, and autonomous workflows.
MiMo-V2.5 represents a meaningful step forward from its predecessor, MiMo-V2-Flash. By utilizing native, dedicated encoders for diverse data types, the model achieves a level of cohesion not commonly seen in large-scale models.
Key Technical Features
Configuration Notice: Developers who downloaded the model prior to recent repository updates should re-pull the config.json and tokenizer_config.json files to ensure optimal performance and avoid degraded behavior.
MiMo-V2.5 demonstrates competitive performance against frontier closed-source models, particularly in coding, temporal video reasoning, and agentic decision-making.
The model’s use of Reinforcement Learning (RL) places it near the Pareto frontier for daily agentic tasks.
| Benchmark | Category | MiMo-V2.5 Score | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|---|
| Coding (General) | Programming/Logic | 71.8 | 77.1 | 67.8 |
| Claw-Eval Text | General Agentic | 65.8 | 70.8 | 68.5 |
| Terminal-Bench 2.0 | CLI Operations | 56.1 | 57.3 | 54.2 |
MiMo-V2.5 shows sharp perception for temporal reasoning, matching or approaching industry leaders in video and image understanding.
| Benchmark | Modality | MiMo-V2.5 Score | Gemini 3 Pro | Kimi K2.6 |
|---|---|---|---|---|
| Image Understanding | Vision-Language | 81.0 | 81.4 | 80.4 |
| Video-MME | Video | 83.5 | 84.2 | — |
| MMMU-Pro | Multi-discipline | 88.5 | — | — |
| CharXiv RQ | Chart/Diagram | 77.9 | 81.0 | 79.4 |
The model supports up to 1,000,000 tokens, validated through benchmarks like Graphwalks for path-finding and retrieval. A learnable attention sink bias helps reasoning accuracy remain stable even at the 1M token limit.
MiMo-V2.5 is hosted on DeepInfra, providing high-performance, low-latency inference via an OpenAI-compatible API.
Retrieve your API key from your DeepInfra Dashboard and include it in your HTTP headers:
Authorization: Bearer <YOUR_DEEPINFRA_API_KEY>
Using cURL
curl -X POST https://api.deepinfra.com/v1/openai/chat/completions \
-H "Authorization: Bearer $DEEPINFRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "XiaomiMiMo/MiMo-V2.5",
"messages": [
{
"role": "user",
"content": "Explain the advantages of a hybrid attention architecture in 2 sentences."
}
]
}'Using Python
import os
import requests
url = "https://api.deepinfra.com/v1/openai/chat/completions"
api_key = os.getenv("DEEPINFRA_API_KEY")
payload = {
"model": "XiaomiMiMo/MiMo-V2.5",
"messages": [{"role": "user", "content": "Explain the advantages of a hybrid attention architecture."}]
}
response = requests.post(url, headers={"Authorization": f"Bearer {api_key}"}, json=payload)
print(response.json())Pricing is usage-based, calculated per 1 million tokens. DeepInfra offers two tiers to balance cost and priority.
| Tier | Input Price | Output Price | Cached Input Price |
|---|---|---|---|
| Standard | $0.40 | $2.00 | $0.08 |
| Priority (1.5×) | $0.60 | $3.00 | $0.12 |
XiaomiMiMo’s MiMo-V2.5 is a capable and versatile model for the next generation of AI applications. By combining a 1M token context window with native omnimodal understanding and an efficient MoE architecture, it gives developers frontier-model capabilities at a comparatively lower resource cost.
Whether you are building agentic workflows, analyzing hour-long videos, or processing large document sets, MiMo-V2.5 offers the performance and flexibility for professional-grade deployment.
Kimi K2.6 Model Overview: Architecture, Features & Capabilities<p>Kimi K2.6 is Moonshot AI’s latest flagship open-source model, released on April 20, 2026 under a Modified MIT license. It is a native multimodal agentic model built on a 1-trillion parameter Mixture-of-Experts (MoE) architecture, with 32 billion parameters activated per token. The model is designed for long-horizon coding, autonomous execution, and multi-agent orchestration, and is […]</p>
GLM-5.1 Model Overview: Features, Capabilities & Use Cases<p>GLM-5.1 is Z.AI’s next-generation flagship model for agentic engineering, released on April 7, 2026 under the MIT license. It is a 754-billion parameter Mixture-of-Experts model with 40 billion active parameters per token, a 202,752-token context window, and up to 131K output tokens. The model is the direct successor to GLM-5, designed specifically for long-horizon autonomous […]</p>
LLM API Provider Performance KPIs 101: TTFT, Throughput & End-to-End Goals<p>Fast, predictable responses turn a clever demo into a dependable product. If you’re building on an LLM API provider like DeepInfra, three performance ideas will carry you surprisingly far: time-to-first-token (TTFT), throughput, and an explicit end-to-end (E2E) goal that blends speed, reliability, and cost into something users actually feel. This beginner-friendly guide explains each KPI […]</p>
© 2026 DeepInfra. All rights reserved.