We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

Bria/video_eraser cover image

Remove unwanted objects or regions from video using a mask, reconstructs the background with intelligent content-aware fill.

$0.1400 / second

video_foreground_mask

Bria/video_foreground_mask cover image

Automatically identify and segment foreground objects across video frames and generate a mask. No prompts, just a video.

$0.1400 / second

video_increase_resolution

Bria/video_increase_resolution cover image

Increase video resolution up to 8K with advanced AI upscaling. Bring your videos to the big screen, ready for the screens of tomorrow.

$0.1400 / second

video_mask_by_key_points

Bria/video_mask_by_key_points cover image

Identify and segment objects across video frames using specific coordinate points. Just point in the right direction and the model will figure out by itself which object should be masked.

$0.1400 / second

video_mask_by_prompt

Bria/video_mask_by_prompt cover image

Identify and segment objects across video frames using a text prompt. The easiest way to create a mask to modify your videos.

$0.1400 / second

video_remove_background

Bria/video_remove_background cover image

Light and fast. Remove the background of your videos to bring the foreground elements to focus. No more unwanted distractions.

$0.0042 / second

Seedance-1.5-Pro

ByteDance/Seedance-1.5-Pro cover image

ByteDance's Seedance 1.5 Pro is a professional video model using V2A native generation for integrated, synced audio-visual output, enhancing efficiency of professional video creation.

$1.200 / 1M tokens

ByteDance/Seedance-2.0 cover image

A new-generation professional-grade multimodal video creation model developed, supports video generation with multimodal reference inputs including images, videos and audio.

$4.300 / 1M tokens

FastWan-QAD-FP8-1.3B

FastVideo/FastWan-QAD-FP8-1.3B cover image

A fast, compact 480p text-to-video model — 5-second clips (landscape or portrait) from a text prompt. A 3-step, FP8 quantization-aware distillation of Wan2.1-T2V-1.3B by FastVideo (Hao AI Lab).

$0.0025 / second (480p)

Pixverse/Pixverse-6-I2V cover image

PixVerse V6 redefines AI video by shifting from isolated generation to a unified, model-driven workflow. Key upgrades include 15-second durations at 1080p resolution and a multi-shot engine. This transition allows creators to move beyond short clips toward meaningful narrative production and professional-grade marketing assets suitable for 2026 digital distribution standards.

$0.045 / second

Pixverse/Pixverse-6-T2V cover image

PixVerse V6 redefines AI video by shifting from isolated generation to a unified, model-driven workflow. Key upgrades include 15-second durations at 1080p resolution and a multi-shot engine. This transition allows creators to move beyond short clips toward meaningful narrative production and professional-grade marketing assets suitable for 2026 digital distribution standards.

$0.045 / second

Pixverse/Pixverse-T2V cover image

PixVerse's 720p resolution offers a fast and reliable option for generating standard HD videos, ideal for quick previews and social media content where generation speed is prioritized over maximum detail.

Pixverse-T2V-HD

Pixverse/Pixverse-T2V-HD cover image

The 1080p high-fidelity mode in PixVerse renders videos with significantly enhanced sharpness and visual clarity, capturing intricate details and providing a crisp, professional-grade quality suitable for more polished projects.

PrunaAI/p-video cover image

Real-time AI video generation from text, images, and audio. Supports up to 1080p at 48 FPS with built-in audio generation, draft mode for 4x faster previews, and prompt upsampling.

Pruna's talking head video generation model. Provide a portrait image and either a speech script or an audio file, and the model generates a realistic video of the person speaking. Supports multiple voices, languages, and output resolutions.

$0.025 / second

Wan2.2-T2V-A14B

Wan-AI/Wan2.2-T2V-A14B cover image

The Wan2.2 T2V A14B is a next-generation 14B-parameter video foundation model by Wan-AI featuring a novel two-stage denoising architecture. It produces 480P videos with improved visual coherence and detail, generating 2 or 5 second clips at 16fps from text prompts.

$0.0360 / second

Wan-AI/Wan2.6-I2V cover image

Turn any image into a video. Intelligent shot scheduling supports multi-shot storytelling, generating multi-shot narrative videos with consistent subjects, scenes, and atmosphere

Wan-AI/Wan2.6-T2V cover image

Turn any prompt into a smooth video. Intelligent shot scheduling supports multi-shot storytelling, generating multi-shot narrative videos with consistent subjects, scenes, and atmosphere

Wan-AI/Wan2.7-I2V cover image

Generates video content from images while stably preserving details such as subject, style, and text elements. Ensures visual consistency and information fidelity throughout dynamic transitions.

Wan-AI/Wan2.7-R2V cover image

Accurately preserve the look and voice of people or objects from a reference video, supporting multi-reference co-creation.

google/veo-3.1 cover image

Veo 3.1 is the latest text-to-video model from Google that generates high-fidelity, cinematic videos with synchronized audio from a simple text prompt. It excels at creating realistic and imaginative scenes with a deep understanding of natural language and visual dynamics.

$0.4000 / second

google/veo-3.1-fast cover image

Veo 3.1 is the latest text-to-video model from Google that generates high-fidelity, cinematic videos with synchronized audio from a simple text prompt. It excels at creating realistic and imaginative scenes with a deep understanding of natural language and visual dynamics.

$0.1500 / second

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

google/gemma-4-E4B-it tencent/Hy3 anthropic/claude-fable-5 anthropic/claude-sonnet-5 MiniMaxAI/MiniMax-M3

Featured Models

black-forest-labs/FLUX-2-klein-4b Qwen/Qwen3-TTS-VoiceDesign moonshotai/Kimi-K2.6 nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B deepseek-ai/DeepSeek-V3.2

Built With Love in Palo Alto

© 2026 DeepInfra. All rights reserved.

Privacy Policy Terms of Service