Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/featured

Our most popular AI models used by thousands of users in their apps and research. What will you create today?

black-forest-labs/FLUX-1.1-pro cover image
$0.04 / img
  • text-to-image

Black Forest Labs' latest state-of-the art proprietary model sporting top of the line prompt following, visual quality, details and output diversity.

black-forest-labs/FLUX-1-schnell cover image
$0.0005 x (width / 1024) x (height / 1024) x iters
  • text-to-image

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. This model offers cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps.

black-forest-labs/FLUX-1-dev cover image
$0.009 x (width / 1024) x (height / 1024) x (iters / 25)
  • text-to-image

FLUX.1-dev is a state-of-the-art 12 billion parameter rectified flow transformer developed by Black Forest Labs. This model excels in text-to-image generation, providing highly accurate and detailed outputs. It is particularly well-regarded for its ability to follow complex prompts and generate anatomically accurate images, especially with challenging details like hands and faces.

black-forest-labs/FLUX-pro cover image
$0.05 / img
  • text-to-image

Black Forest Labs' first flagship model based on Flux latent rectified flow transformers

stabilityai/sd3.5-medium cover image
$0.03 / img
  • text-to-image

At 2.5 billion parameters, with improved MMDiT-X architecture and training methods, this model is designed to run “out of the box” on consumer hardware, striking a balance between quality and ease of customization. It is capable of generating images ranging between 0.25 and 2 megapixel resolution.

hexgrad/Kokoro-82M cover image
$0.80 per M characters
  • text-to-speech

Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of 2 Jan 2025, 10 unique Voicepacks have been released, and a .onnx version of v0.19 is available.

openai/whisper-large-v3-turbo cover image
$0.00020 / minute
  • automatic-speech-recognition

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

openai/whisper-large-v3 cover image
$0.00045 / minute
  • automatic-speech-recognition

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

distil-whisper/distil-large-v3 cover image
$0.00018 / minute
  • automatic-speech-recognition

Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date. Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give superior long-form transcription accuracy with OpenAI's sequential long-form algorithm.

microsoft/WizardLM-2-8x22B cover image
$0.50 / Mtoken
  • text-generation

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to those leading proprietary models.