We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

text-generation

claude-3-7-sonnet-latest

anthropic/claude-3-7-sonnet-latest cover image

$0.33 cached, $3.30 in, $16.50 out / 1M

text-generation

gemini-1.5-flash

google/gemini-1.5-flash cover image

Gemini 1.5 Flash is Google's foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter.

text-generation

gemini-1.5-flash-8b

google/gemini-1.5-flash-8b cover image

text-generation

Llama-3.2-11B-Vision-Instruct

meta-llama/Llama-3.2-11B-Vision-Instruct cover image

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

$0.049 / 1M tokens

text-generation

Llama-3.2-3B-Instruct

meta-llama/Llama-3.2-3B-Instruct cover image

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out)

$0.02 / 1M tokens

text-generation

Meta-Llama-3-8B-Instruct

meta-llama/Meta-Llama-3-8B-Instruct cover image

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.

$0.03 in, $0.06 out / 1M

text-generation

Meta-Llama-3.1-70B-Instruct

meta-llama/Meta-Llama-3.1-70B-Instruct cover image

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

$0.40 / 1M tokens

text-generation

Meta-Llama-3.1-70B-Instruct-Turbo

meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo cover image

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

$0.40 / 1M tokens

text-generation

Meta-Llama-3.1-8B-Instruct

meta-llama/Meta-Llama-3.1-8B-Instruct cover image

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

$0.03 in, $0.05 out / 1M

text-generation

Meta-Llama-3.1-8B-Instruct-Turbo

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo cover image

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

$0.02 in, $0.03 out / 1M

text-generation

WizardLM-2-8x22B

microsoft/WizardLM-2-8x22B cover image

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to those leading proprietary models.

$0.48 / 1M tokens

text-generation

Mistral-Nemo-Instruct-2407

mistralai/Mistral-Nemo-Instruct-2407 cover image

12B model trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

$0.02 in, $0.04 out / 1M

text-generation

Mistral-Small-24B-Instruct-2501

mistralai/Mistral-Small-24B-Instruct-2501 cover image

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware.

$0.05 in, $0.08 out / 1M

text-generation

Mixtral-8x7B-Instruct-v0.1

mistralai/Mixtral-8x7B-Instruct-v0.1 cover image

Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.

$0.54 / 1M tokens

text-generation

Llama-3.1-Nemotron-70B-Instruct

nvidia/Llama-3.1-Nemotron-70B-Instruct cover image

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

$1.20 / 1M tokens

text-generation

Llama-3.3-Nemotron-Super-49B-v1.5

nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 cover image

Llama-3.3-Nemotron-Super-49B-v1.5 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements.

$0.10 in, $0.40 out / 1M

text-generation

NVIDIA-Nemotron-Nano-12B-v2-VL

nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL cover image

The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.

$0.20 in, $0.60 out / 1M

text-generation

NVIDIA-Nemotron-Nano-9B-v2

nvidia/NVIDIA-Nemotron-Nano-9B-v2 cover image

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.

$0.04 in, $0.16 out / 1M

text-generation

gpt-oss-120b-Turbo

openai/gpt-oss-120b-Turbo cover image

$0.15 in, $0.60 out / 1M

text-generation

zai-org/GLM-4.6 cover image

Compared with GLM-4.5, GLM-4.6 brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

$0.08 cached, $0.43 in, $1.75 out / 1M

text-generation

zai-org/GLM-4.6V cover image

This model is part of the GLM-V family of models, introduced in the paper GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning.

$0.30 in, $0.90 out / 1M

text-generation

zai-org/GLM-4.7 cover image

GLM-4.7 is a state-of-the-art, multilingual Mixture-of-Experts (MoE) language model designed for complex reasoning, agentic coding, and tool use. Building on its predecessor GLM-4.6, it delivers significant improvements across key benchmarks, including multilingual SWE-bench, Terminal Bench, and reasoning-heavy evaluations like HLE. The model features advanced "Interleaved Thinking" and new "Preserved Thinking" modes, allowing it to reason before actions and maintain consistency across long, multi-turn tasks. With 358 billion parameters, GLM-4.7 excels in generating clean code, modern UI elements, and sophisticated reasoning outputs.

$0.08 cached, $0.43 in, $1.75 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

moonshotai/Kimi-K2-Instruct-0905 deepseek-ai/DeepSeek-V3.1 anthropic/claude-3-7-sonnet-latest deepseek-ai/DeepSeek-V3.2-Exp zai-org/GLM-4.6

Featured Models

deepseek-ai/DeepSeek-V3.1 ResembleAI/chatterbox-turbo deepseek-ai/DeepSeek-V3.2 openai/gpt-oss-20b microsoft/phi-4

Built With Love in Palo Alto

© 2026 Deep Infra. All rights reserved.

Privacy Policy Terms of Service