We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

featured

chatterbox-turbo

ResembleAI/chatterbox-turbo cover image

Chatterbox is a family of three state-of-the-art, open-source text-to-speech models by Resemble AI. We are excited to introduce Chatterbox-Turbo, our most efficient model yet. Built on a streamlined 350M parameter architecture, Turbo delivers high-quality speech with less compute and VRAM than our previous models. We have also distilled the speech-token-to-mel decoder, previously a bottleneck, reducing generation from 10 steps to just one, while retaining high-fidelity audio output. Paralinguistic tags are now native to the Turbo model, allowing you to use [cough], [laugh], [chuckle], and more to add distinct realism. While Turbo was built primarily for low-latency voice agents, it excels at narration and creative workflows. If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.

$1.00 per 1M characters

featured

text-generation

Nemotron-3-Nano-30B-A3B

nvidia/Nemotron-3-Nano-30B-A3B cover image

NVIDIA Nemotron 3 Nano is an open reasoning model optimized for fast, cost-efficient inference. Built with a hybrid MoE and Mamba architecture and trained on NVIDIA-curated synthetic reasoning data, it delivers strong multi-step reasoning with stable latency and predictable performance for agentic and production workloads.

$0.06 in, $0.24 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3.2 cover image

DeepSeek-V3.2 is a large language model designed to harmonize high computational efficiency with strong reasoning and agentic tool-use performance. It introduces DeepSeek Sparse Attention (DSA), a fine-grained sparse attention mechanism that reduces training and inference cost while preserving quality in long-context scenarios. A scalable reinforcement learning post-training framework further improves reasoning, with reported performance in the GPT-5 class, and the model has demonstrated gold-medal results on the 2025 IMO and IOI. V3.2 also uses a large-scale agentic task synthesis pipeline to better integrate reasoning into tool-use settings, boosting compliance and generalization in interactive environments.

$0.13 cached, $0.26 in, $0.39 out / 1M

featured

text-generation

MiniMaxAI/MiniMax-M2 cover image

MiniMax-M2 is a Mini model built for Max coding & agentic workflows with just 10 billion activated parameters

$0.127 cached, $0.254 in, $1.02 out / 1M

featured

text-generation

Kimi-K2-Thinking

moonshotai/Kimi-K2-Thinking cover image

Kimi K2 Thinking is the latest, most capable version of open-source thinking model developed by MoonshotAI

$0.141 cached, $0.47 in, $2.00 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-OCR cover image

DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs.

$0.03 in, $0.10 out / 1M

featured

text-generation

olmOCR-2-7B-1025

allenai/olmOCR-2-7B-1025 cover image

olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions.

$0.09 in, $0.19 out / 1M

featured

text-generation

PaddleOCR-VL-0.9B

PaddlePaddle/PaddleOCR-VL-0.9B cover image

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition. This innovative model efficiently supports 109 languages and excels in recognizing complex elements (e.g., text, tables, formulas, and charts), while maintaining minimal resource consumption. Through comprehensive evaluations on widely used public benchmarks and in-house benchmarks, PaddleOCR-VL achieves SOTA performance in both page-level document parsing and element-level recognition. It significantly outperforms existing solutions, exhibits strong competitiveness against top-tier VLMs, and delivers fast inference speeds. These strengths make it highly suitable for practical deployment in real-world scenarios.

$0.14 in, $0.80 out / 1M

featured

text-generation

DeepSeek-V3.1-Terminus

deepseek-ai/DeepSeek-V3.1-Terminus cover image

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

$0.168 cached, $0.21 in, $0.79 out / 1M

featured

text-generation

Qwen3-Next-80B-A3B-Instruct

Qwen/Qwen3-Next-80B-A3B-Instruct cover image

Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next.

$0.09 in, $1.10 out / 1M

featured

text-generation

Kimi-K2-Instruct-0905

moonshotai/Kimi-K2-Instruct-0905 cover image

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

$0.32 cached, $0.40 in, $2.00 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3.1 cover image

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.

$0.168 cached, $0.21 in, $0.79 out / 1M

featured

text-generation

openai/gpt-oss-120b cover image

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

$0.039 in, $0.19 out / 1M

featured

text-generation

openai/gpt-oss-20b cover image

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

$0.03 in, $0.14 out / 1M

featured

text-generation

Qwen3-Coder-480B-A35B-Instruct-Turbo

Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo cover image

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

$0.28 in, $1.20 out / 1M

featured

text-generation

Qwen3-235B-A22B-Thinking-2507

Qwen/Qwen3-235B-A22B-Thinking-2507 cover image

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

$0.23 in, $2.39 out / 1M

featured

text-generation

Qwen3-Coder-480B-A35B-Instruct

Qwen/Qwen3-Coder-480B-A35B-Instruct cover image

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

$0.40 in, $1.60 out / 1M

featured

speech-recognition

Voxtral-Small-24B-2507

mistralai/Voxtral-Small-24B-2507 cover image

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

$0.00300 / minute

featured

speech-recognition

Voxtral-Mini-3B-2507

mistralai/Voxtral-Mini-3B-2507 cover image

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

$0.00100 / minute

featured

text-generation

DeepSeek-R1-0528-Turbo

deepseek-ai/DeepSeek-R1-0528-Turbo cover image

The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses

$1.00 in, $3.00 out / 1M

featured

text-generation

Qwen3-235B-A22B-Instruct-2507

Qwen/Qwen3-235B-A22B-Instruct-2507 cover image

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

$0.071 in, $0.463 out / 1M

featured

text-generation

Qwen/Qwen3-30B-A3B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

$0.08 in, $0.29 out / 1M

featured

text-generation

Qwen/Qwen3-32B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

$0.08 in, $0.28 out / 1M

featured

text-generation

Qwen/Qwen3-14B cover image

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

$0.08 in, $0.24 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V3.2-Exp zai-org/GLM-4.6 anthropic/claude-3-7-sonnet-latest deepseek-ai/DeepSeek-V3.1 moonshotai/Kimi-K2-Instruct-0905

Featured Models

deepseek-ai/DeepSeek-R1-Distill-Llama-70B Qwen/Qwen3-Next-80B-A3B-Instruct deepseek-ai/DeepSeek-V3.1-Terminus google/gemma-3-4b-it openai/gpt-oss-120b

Built With Love in Palo Alto

© 2026 Deep Infra. All rights reserved.

Privacy Policy Terms of Service