We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

Qwen3-Embedding-8B

Qwen/Qwen3-Embedding-8B cover image

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).

$0.050 / 1M tokens

Qwen3-Embedding-8B-batch

Qwen/Qwen3-Embedding-8B-batch cover image

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).

$0.025 / 1M tokens

Qwen3-Reranker-0.6B

Qwen/Qwen3-Reranker-0.6B cover image

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B)

$0.010 / 1M tokens

Qwen3-Reranker-4B

Qwen/Qwen3-Reranker-4B cover image

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B)

$0.025 / 1M tokens

Qwen3-Reranker-8B

Qwen/Qwen3-Reranker-8B cover image

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B)

$0.050 / 1M tokens

text-generation

Qwen3-VL-235B-A22B-Instruct

Qwen/Qwen3-VL-235B-A22B-Instruct cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.30 in, $1.49 out / 1M

text-generation

Qwen3-VL-30B-A3B-Instruct

Qwen/Qwen3-VL-30B-A3B-Instruct cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.15 in, $0.60 out / 1M

ResembleAI/chatterbox cover image

New model named Chatterbox by Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations. Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out.

$10.00 per 1M characters

text-generation

L3-8B-Lunaris-v1-Turbo

Sao10K/L3-8B-Lunaris-v1-Turbo cover image

$0.04 in, $0.05 out / 1M

text-generation

L3.1-70B-Euryale-v2.2

Sao10K/L3.1-70B-Euryale-v2.2 cover image

Euryale 3.1 - 70B v2.2 is a model focused on creative roleplay from Sao10k

$0.85 / 1M tokens

text-generation

L3.3-70B-Euryale-v2.3

Sao10K/L3.3-70B-Euryale-v2.3 cover image

L3.3-70B-Euryale-v2.3 is a model focused on creative roleplay from Sao10k

$0.85 / 1M tokens

Wan2.1-T2V-1.3B

Wan-AI/Wan2.1-T2V-1.3B cover image

The Wan2.1 1.3B model is a lightweight, efficient text-to-video generator. Despite its compact size, it delivers impressive performance across benchmarks and generates high-quality 480P videos.

Wan-AI/Wan2.1-T2V-14B cover image

The Wan2.1 14B model is a high-capacity, state-of-the-art video foundation model capable of producing both 480P and 720P videos. It excels at capturing complex prompts and generating visually rich, detailed scenes, making it ideal for high-end creative tasks.

Zonos-v0.1-hybrid

Zyphra/Zonos-v0.1-hybrid cover image

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

$7.00 per 1M characters

Zonos-v0.1-transformer

Zyphra/Zonos-v0.1-transformer cover image

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

$7.00 per 1M characters

text-generation

claude-3-7-sonnet-latest

anthropic/claude-3-7-sonnet-latest cover image

$0.33 cached, $3.30 in, $16.50 out / 1M

black-forest-labs/

FLUX-1-Redux-dev

black-forest-labs/FLUX-1-Redux-dev cover image

FLUX.1 Redux [dev] is an image variation generation adapter for all FLUX.1 base models. It enables users to refine images with slight variations and supports text-based restyling via API. Integrated with FLUX1.1 [pro] Ultra, it allows for high-quality 4-megapixel outputs. The model can be used with Diffusers in Python for efficient image generation. While powerful, it has ethical and factual limitations and is governed by a non-commercial license.

$0.012 x (width / 1024) x (height / 1024) x (iters / 25)

black-forest-labs/

black-forest-labs/FLUX-1-dev cover image

FLUX.1-dev is a state-of-the-art 12 billion parameter rectified flow transformer developed by Black Forest Labs. This model excels in text-to-image generation, providing highly accurate and detailed outputs. It is particularly well-regarded for its ability to follow complex prompts and generate anatomically accurate images, especially with challenging details like hands and faces.

$0.009 x (width / 1024) x (height / 1024) x (iters / 25)

black-forest-labs/

black-forest-labs/FLUX-1-schnell cover image

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. This model offers cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps.

$0.0005 x (width / 1024) x (height / 1024) x iters

black-forest-labs/

black-forest-labs/FLUX-1.1-pro cover image

Black Forest Labs' latest state-of-the art proprietary model sporting top of the line prompt following, visual quality, details and output diversity.

black-forest-labs/

black-forest-labs/FLUX-pro cover image

Black Forest Labs' first flagship model based on Flux latent rectified flow transformers

black-forest-labs/

FLUX.1-Kontext-dev

black-forest-labs/FLUX.1-Kontext-dev cover image

FLUX.1 Kontext [dev] is a 12-billion-parameter image editing model that transforms visuals based on natural language instructions. It allows highly consistent, multi-step edits and is released with open weights under a non-commercial license to empower artists and researchers.

$0.01 x (width / 1024) x (height / 1024) x (iters / 25)

text-generation

deepseek-ai/DeepSeek-R1 cover image

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

$0.70 in, $2.40 out / 1M

text-generation

DeepSeek-R1-Turbo

deepseek-ai/DeepSeek-R1-Turbo cover image

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

$1.00 in, $3.00 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V3.2-Exp zai-org/GLM-4.6 moonshotai/Kimi-K2-Instruct-0905 deepseek-ai/DeepSeek-V3.1 anthropic/claude-3-7-sonnet-latest

Featured Models

Qwen/Qwen3-32B moonshotai/Kimi-K2-Thinking canopylabs/orpheus-3b-0.1-ft google/gemma-3-4b-it meta-llama/Llama-3.3-70B-Instruct-Turbo

Built With Love in Palo Alto

© 2025 Deep Infra. All rights reserved.

Privacy Policy Terms of Service