We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

Qwen/Qwen3-235B-A22B cover image
featured
fp8
40k
$0.20/$0.60 in/out Mtoken
  • text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

Qwen/Qwen3-30B-A3B cover image
featured
fp8
40k
$0.10/$0.30 in/out Mtoken
  • text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

Qwen/Qwen3-32B cover image
featured
fp8
40k
$0.10/$0.30 in/out Mtoken
  • text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

Qwen/Qwen3-14B cover image
featured
fp8
40k
$0.08/$0.24 in/out Mtoken
  • text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 cover image
featured
fp8
1024k
$0.17/$0.60 in/out Mtoken
  • text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

meta-llama/Llama-4-Scout-17B-16E-Instruct cover image
featured
bfloat16
320k
$0.08/$0.30 in/out Mtoken
  • text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

deepseek-ai/DeepSeek-R1-Turbo cover image
featured
fp4
32k
$1.00/$3.00 in/out Mtoken
  • text-generation

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

deepseek-ai/DeepSeek-R1 cover image
featured
fp8
160k
$0.50/$2.18 in/out Mtoken
  • text-generation

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

Qwen/QwQ-32B cover image
featured
bfloat16
128k
$0.15/$0.20 in/out Mtoken
  • text-generation

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

deepseek-ai/DeepSeek-V3-0324 cover image
featured
fp8
160k
$0.30/$0.88 in/out Mtoken
  • text-generation

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

google/gemma-3-27b-it cover image
featured
bfloat16
128k
$0.10/$0.20 in/out Mtoken
  • text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

google/gemma-3-12b-it cover image
featured
bfloat16
128k
$0.05/$0.10 in/out Mtoken
  • text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

google/gemma-3-4b-it cover image
featured
bfloat16
128k
$0.02/$0.04 in/out Mtoken
  • text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

hexgrad/Kokoro-82M cover image
featured
$0.80 per M characters
  • text-to-speech

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

nari-labs/Dia-1.6B cover image
featured
$20.00 per M characters
  • text-to-speech

Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.

canopylabs/orpheus-3b-0.1-ft cover image
featured
$7.00 per M characters
  • text-to-speech

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

sesame/csm-1b cover image
featured
$7.00 per M characters
  • text-to-speech

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

microsoft/Phi-4-multimodal-instruct cover image
featured
bfloat16
128k
$0.05/$0.10 in/out Mtoken
  • text-generation

Phi-4-multimodal-instruct is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models. The model processes text, image, and audio inputs, generating text outputs, and comes with 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning, direct preference optimization and RLHF (Reinforcement Learning from Human Feedback) to support precise instruction adherence and safety measures. The languages that each modal supports are the following: - Text: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian - Vision: English - Audio: English, Chinese, German, French, Italian, Japanese, Spanish, Portuguese