Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

deepseek-ai/DeepSeek-R1-Turbo cover image
featured
fp4
32k
$2.00/$6.00 in/out Mtoken
  • text-generation

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

deepseek-ai/DeepSeek-R1 cover image
featured
fp8
64k
$0.75/$2.40 in/out Mtoken
  • text-generation

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

Qwen/QwQ-32B cover image
featured
bfloat16
128k
$0.12/$0.18 in/out Mtoken
  • text-generation

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

google/gemma-3-27b-it cover image
featured
bfloat16
128k
$0.10/$0.20 in/out Mtoken
  • text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

microsoft/Phi-4-multimodal-instruct cover image
featured
bfloat16
128k
$0.07/$0.14 in/out Mtoken
  • text-generation

Phi-4-multimodal-instruct is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models. The model processes text, image, and audio inputs, generating text outputs, and comes with 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning, direct preference optimization and RLHF (Reinforcement Learning from Human Feedback) to support precise instruction adherence and safety measures. The languages that each modal supports are the following: - Text: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian - Vision: English - Audio: English, Chinese, German, French, Italian, Japanese, Spanish, Portuguese

deepseek-ai/DeepSeek-R1-Distill-Llama-70B cover image
featured
bfloat16
128k
$0.23/$0.69 in/out Mtoken
  • text-generation

DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks.

deepseek-ai/DeepSeek-V3 cover image
featured
fp8
64k
$0.49/$0.89 in/out Mtoken
  • text-generation

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

meta-llama/Llama-3.3-70B-Instruct-Turbo cover image
featured
fp8
128k
$0.12/$0.30 in/out Mtoken
  • text-generation

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

meta-llama/Llama-3.3-70B-Instruct cover image
featured
bfloat16
128k
$0.23/$0.40 in/out Mtoken
  • text-generation

Llama 3.3-70B is a multilingual LLM trained on a massive dataset of 15 trillion tokens, fine-tuned for instruction-following and conversational dialogue. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

mistralai/Mistral-Small-24B-Instruct-2501 cover image
featured
fp8
32k
$0.07/$0.14 in/out Mtoken
  • text-generation

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware.

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B cover image
featured
fp8
128k
$0.12/$0.18 in/out Mtoken
  • text-generation

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: AIME 2024: 72.6 | MATH-500: 94.3 | CodeForces Rating: 1691.

microsoft/phi-4 cover image
featured
bfloat16
16k
$0.07/$0.14 in/out Mtoken
  • text-generation

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

meta-llama/Meta-Llama-3.1-70B-Instruct cover image
featured
bfloat16
128k
$0.23/$0.40 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-8B-Instruct cover image
featured
bfloat16
128k
$0.03/$0.05 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-405B-Instruct cover image
featured
fp8
32k
$0.80 / Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo cover image
featured
fp8
128k
$0.02/$0.05 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo cover image
featured
fp8
128k
$0.12/$0.30 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

Qwen/Qwen2.5-Coder-32B-Instruct cover image
featured
fp8
32k
$0.07/$0.16 in/out Mtoken
  • text-generation

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). It has significant improvements in code generation, code reasoning and code fixing. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.