Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

Sao10K/L3.3-70B-Euryale-v2.3 cover image
fp8
128k
$0.70/$0.80 in/out Mtoken
  • text-generation

L3.3-70B-Euryale-v2.3 is a model focused on creative roleplay from Sao10k

XpucT/Deliberate cover image
Replaced
  • text-to-image

The Deliberate Model allows for the creation of anything desired, with the potential for better results as the user's knowledge and detail in the prompt increase. The model is ideal for meticulous anatomy artists, creative prompt writers, art designers, and those seeking explicit content.

Zyphra/Zonos-v0.1-hybrid cover image
$7.00 per M characters
  • text-to-speech

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

Zyphra/Zonos-v0.1-transformer cover image
$7.00 per M characters
  • text-to-speech

Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. Our model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning when given a reference clip spanning just a few seconds. The conditioning setup also allows for fine control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. The model outputs speech natively at 44kHz.

bigcode/starcoder2-15b-instruct-v0.1 cover image
fp16
Replaced
  • text-generation

We introduce StarCoder2-15B-Instruct-v0.1, the very first entirely self-aligned code Large Language Model (LLM) trained with a fully permissive and transparent pipeline. Our open-source pipeline uses StarCoder2-15B to generate thousands of instruction-response pairs, which are then used to fine-tune StarCoder-15B itself without any human annotations or distilled data from huge and proprietary LLMs.

black-forest-labs/FLUX-1-Redux-dev cover image
$0.012 x (width / 1024) x (height / 1024) x (iters / 25)
  • text-to-image

FLUX.1 Redux [dev] is an image variation generation adapter for all FLUX.1 base models. It enables users to refine images with slight variations and supports text-based restyling via API. Integrated with FLUX1.1 [pro] Ultra, it allows for high-quality 4-megapixel outputs. The model can be used with Diffusers in Python for efficient image generation. While powerful, it has ethical and factual limitations and is governed by a non-commercial license.

cognitivecomputations/dolphin-2.6-mixtral-8x7b cover image
bfloat16
32k
Replaced
  • text-generation

The Dolphin 2.6 Mixtral 8x7b model is a finetuned version of the Mixtral-8x7b model, trained on a variety of data including coding data, for 3 days on 4 A100 GPUs. It is uncensored and requires trust_remote_code. The model is very obedient and good at coding, but not DPO tuned. The dataset has been filtered for alignment and bias. The model is compliant with user requests and can be used for various purposes such as generating code or engaging in general chat.

cognitivecomputations/dolphin-2.9.1-llama-3-70b cover image
bfloat16
8k
Replaced
  • text-generation

Dolphin 2.9.1, a fine-tuned Llama-3-70b model. The new model, trained on filtered data, is more compliant but uncensored. It demonstrates improvements in instruction, conversation, coding, and function calling abilities.

deepinfra/airoboros-70b cover image
fp16
4k
Replaced
  • text-generation

Latest version of the Airoboros model fine-tunned version of llama-2-70b using the Airoboros dataset. This model is currently running jondurbin/airoboros-l2-70b-2.2.1

deepinfra/tts cover image
Deprecated
  • text-to-speech

Text-to-Speech (TTS) technology converts written text into spoken words using advanced speech synthesis. TTS systems are used in applications like virtual assistants, accessibility tools for visually impaired users, and language learning software, enabling seamless human-computer interaction.

deepseek-ai/Janus-Pro-1B cover image
$0.0005 / img
  • text-to-image

Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.

deepseek-ai/Janus-Pro-7B cover image
$0.002 / img
  • text-to-image

Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing. The decoupling not only alleviates the conflict between the visual encoder’s roles in understanding and generation, but also enhances the framework’s flexibility. Janus-Pro surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus-Pro make it a strong candidate for next-generation unified multimodal models.

distil-whisper/distil-large-v3 cover image
Deprecated
  • automatic-speech-recognition

Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling. This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date. Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give superior long-form transcription accuracy with OpenAI's sequential long-form algorithm.

google/codegemma-7b-it cover image
fp16
8k
Replaced
  • text-generation

CodeGemma is a collection of lightweight open code models built on top of Gemma. CodeGemma models are text-to-text and text-to-code decoder-only models and are available as a 7 billion pretrained variant that specializes in code completion and code generation tasks, a 7 billion parameter instruction-tuned variant for code chat and instruction following and a 2 billion parameter pretrained variant for fast code completion.