We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

featured

text-generation

Llama-4-Maverick-17B-128E-Instruct-FP8

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 cover image

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

$0.15 in, $0.60 out / 1M

featured

text-generation

Llama-4-Scout-17B-16E-Instruct

meta-llama/Llama-4-Scout-17B-16E-Instruct cover image

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

$0.08 in, $0.30 out / 1M

featured

text-generation

DeepSeek-R1-0528

deepseek-ai/DeepSeek-R1-0528 cover image

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528.

$0.40 cached, $0.50 in, $2.15 out / 1M

featured

text-generation

DeepSeek-V3-0324

deepseek-ai/DeepSeek-V3-0324 cover image

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

$0.106 cached, $0.20 in, $0.88 out / 1M

featured

text-generation

Mistral-Small-3.2-24B-Instruct-2506

mistralai/Mistral-Small-3.2-24B-Instruct-2506 cover image

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks.

$0.075 in, $0.20 out / 1M

featured

text-generation

Llama-Guard-4-12B

meta-llama/Llama-Guard-4-12B cover image

Llama Guard 4 is a natively multimodal safety classifier with 12 billion parameters trained jointly on text and multiple images. Llama Guard 4 is a dense architecture pruned from the Llama 4 Scout pre-trained model and fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It itself acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

$0.18 / 1M tokens

featured

text-generation

anthropic/claude-4-opus cover image

Anthropic’s most powerful model yet and the state-of-the-art coding model. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4 is ideal for powering frontier agent products and features.

$16.50 in, $82.50 out / 1M

featured

text-generation

claude-4-sonnet

anthropic/claude-4-sonnet cover image

Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.

$3.30 in, $16.50 out / 1M

featured

text-generation

gemini-2.5-flash

google/gemini-2.5-flash cover image

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

$0.30 in, $2.50 out / 1M

featured

text-generation

google/gemini-2.5-pro cover image

Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. The Gemini 2.5 Pro model is now available on DeepInfra.

$1.25 in, $10.00 out / 1M

featured

text-generation

google/gemma-3-27b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

$0.09 in, $0.16 out / 1M

featured

text-generation

google/gemma-3-12b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.04 in, $0.13 out / 1M

featured

text-generation

google/gemma-3-4b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.04 in, $0.08 out / 1M

featured

hexgrad/Kokoro-82M cover image

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

$0.62 per 1M characters

featured

orpheus-3b-0.1-ft

canopylabs/orpheus-3b-0.1-ft cover image

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

$7.00 per 1M characters

featured

sesame/csm-1b cover image

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

$7.00 per 1M characters

featured

text-generation

DeepSeek-R1-Distill-Llama-70B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B cover image

DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks.

$0.60 in, $1.20 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3 cover image

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

$0.32 in, $0.89 out / 1M

featured

text-generation

Llama-3.3-70B-Instruct-Turbo

meta-llama/Llama-3.3-70B-Instruct-Turbo cover image

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.10 in, $0.32 out / 1M

featured

text-generation

microsoft/phi-4 cover image

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

$0.07 in, $0.14 out / 1M

featured

speech-recognition

whisper-large-v3-turbo

openai/whisper-large-v3-turbo cover image

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

$0.00020 / minute

bge-base-en-v1.5

BAAI/bge-base-en-v1.5 cover image

BGE embedding is a general Embedding Model. It is pre-trained using retromae and trained on large-scale pair data using contrastive learning. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned

$0.005 / 1M tokens

BAAI/bge-en-icl cover image

A LLM-based embedding model with in-context learning capabilities that achieves SOTA performance on BEIR and AIR-Bench. It leverages few-shot examples to enhance task performance.

$0.010 / 1M tokens

bge-large-en-v1.5

BAAI/bge-large-en-v1.5 cover image

BGE embedding is a general Embedding Model. It is pre-trained using retromae and trained on large-scale pair data using contrastive learning. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned

$0.010 / 1M tokens

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

anthropic/claude-3-7-sonnet-latest moonshotai/Kimi-K2-Instruct-0905 deepseek-ai/DeepSeek-V3.2-Exp zai-org/GLM-4.6 deepseek-ai/DeepSeek-V3.1

Featured Models

deepseek-ai/DeepSeek-V3.2 deepseek-ai/DeepSeek-V3.1 openai/gpt-oss-120b Qwen/Qwen3-14B deepseek-ai/DeepSeek-V3-0324

Built With Love in Palo Alto

© 2026 Deep Infra. All rights reserved.

Privacy Policy Terms of Service