We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

View All

featured

text-generation

automatic-speech-recognition

zero-shot-image-classification

multimodal

featured

text-generation

Qwen/

Qwen3-Coder-30B-A3B-Instruct

Qwen3-Coder-30B-A3B-Instruct is a high-performance code generation model optimized for agentic coding and complex programming tasks. With 30.5B total parameters and 3.3B activated through Mixture-of-Experts architecture, it delivers exceptional efficiency. The model features native support for 256K token context (extendable to 1M), making it ideal for repository-scale code understanding. It excels at tool calling, browser automation, and multi-step coding workflows.

fp8

256k

$0.07 in, $0.26 out / 1M

featured

text-generation

zai-org/

GLM-4.6

Compared with GLM-4.5, GLM-4.6 brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

fp8

198k

$0.60 in, $1.90 out / 1M

featured

text-generation

deepseek-ai/

DeepSeek-V3.2-Exp

DeepSeek-V3.2-Exp is an intermediate step toward the next-generation architecture of the DeepSeek models by introducing DeepSeek Sparse Attention—a sparse attention mechanism designed to explore and validate optimizations for training and inference efficiency in long-context scenarios.

fp4

160k

$0.27 in, $0.40 out / 1M

featured

text-generation

deepseek-ai/

DeepSeek-V3.1-Terminus

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

fp4

160k

$0.216 cached, $0.27 in, $1.00 out / 1M

featured

text-generation

Qwen/

Qwen3-Next-80B-A3B-Instruct

Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next.

bfloat16

256k

$0.14 in, $1.10 out / 1M

featured

text-generation

Qwen/

Qwen3-Next-80B-A3B-Thinking

bfloat16

256k

$0.14 in, $1.20 out / 1M

featured

text-generation

moonshotai/

Kimi-K2-Instruct-0905

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

fp4

256k

$0.40 cached, $0.50 in, $2.00 out / 1M

featured

text-generation

deepseek-ai/

DeepSeek-V3.1

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.

fp4

160k

$0.216 cached, $0.27 in, $1.00 out / 1M

featured

text-generation

openai/

gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

fp4

128k

$0.05 in, $0.27 out / 1M

featured

text-generation

openai/

gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

fp4

128k

$0.03 in, $0.14 out / 1M

featured

text-generation

allenai/

olmOCR-7B-0825

olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions.

fp8

16k

$0.14 in, $0.80 out / 1M

featured

text-generation

Qwen/

Qwen3-Coder-480B-A35B-Instruct-Turbo

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

fp4

256k

$0.29 in, $1.20 out / 1M

featured

text-generation

zai-org/

GLM-4.5

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

fp8

128k

$0.38 in, $1.60 out / 1M

featured

text-generation

Qwen/

Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

fp8

256k

$0.30 in, $2.90 out / 1M

featured

text-generation

Qwen/

Qwen3-Coder-480B-A35B-Instruct

fp8

256k

$0.40 in, $1.60 out / 1M

featured

speech-recognition

mistralai/

Voxtral-Small-24B-2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

bf16

32k

$0.00300 / minute

featured

speech-recognition

mistralai/

Voxtral-Mini-3B-2507

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

bf16

32k

$0.00100 / minute

featured

text-generation

deepseek-ai/

DeepSeek-R1-0528-Turbo

The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses

fp4

32k

$1.00 in, $3.00 out / 1M

featured

text-generation

Qwen/

Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

fp8

256k

$0.09 in, $0.57 out / 1M

featured

text-generation

Qwen/

Qwen3-30B-A3B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

fp8

40k

$0.08 in, $0.29 out / 1M

featured

text-generation

Qwen/

Qwen3-32B

fp8

40k

$0.10 in, $0.28 out / 1M

featured

text-generation

Qwen/

Qwen3-14B

fp8

40k

$0.06 in, $0.24 out / 1M

featured

text-generation

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-FP8

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

fp8

1024k

$0.15 in, $0.60 out / 1M

featured

text-generation

meta-llama/

Llama-4-Scout-17B-16E-Instruct

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

fp8

320k

$0.08 in, $0.30 out / 1M

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V3.1 anthropic/claude-3-7-sonnet-latest zai-org/GLM-4.6 deepseek-ai/DeepSeek-V3.2-Exp moonshotai/Kimi-K2-Instruct-0905

Featured Models

Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo mistralai/Mistral-Small-3.2-24B-Instruct-2506 Qwen/Qwen3-32B zai-org/GLM-4.5 zai-org/GLM-4.6