Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

mistralai/Mistral-7B-Instruct-v0.3 cover image
bfloat16
32k
$0.03/$0.055 in/out Mtoken
  • text-generation

Mistral-7B-Instruct-v0.3 is an instruction-tuned model, next iteration of of Mistral 7B that has larger vocabulary, newer tokenizer and supports function calling.

mistralai/Mistral-Nemo-Instruct-2407 cover image
bfloat16
128k
$0.035/$0.08 in/out Mtoken
  • text-generation

12B model trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

mistralai/Mixtral-8x22B-Instruct-v0.1 cover image
bfloat16
64k
Replaced
  • text-generation

This is the instruction fine-tuned version of Mixtral-8x22B - the latest and largest mixture of experts large language model (LLM) from Mistral AI. This state of the art machine learning model uses a mixture 8 of experts (MoE) 22b models. During inference 2 experts are selected. This architecture allows large models to be fast and cheap at inference.

mistralai/Mixtral-8x7B-Instruct-v0.1 cover image
fp8
32k
$0.24 / Mtoken
  • text-generation

Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.

nvidia/Nemotron-4-340B-Instruct cover image
bfloat16
4k
Replaced
  • text-generation

Nemotron-4-340B-Instruct is a chat model intended for use for the English language, designed for Synthetic Data Generation

openai/clip-vit-base-patch32 cover image
$0.0005 / sec
  • zero-shot-image-classification

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

openai/clip-vit-large-patch14-336 cover image
$0.0005 / sec
  • zero-shot-image-classification

A zero-shot-image-classification model released by OpenAI. The clip-vit-large-patch14-336 model was trained from scratch on an unknown dataset and achieves unspecified results on the evaluation set. The model's intended uses and limitations, as well as its training and evaluation data, are not provided. The training procedure used an unknown optimizer and precision, and the framework versions included Transformers 4.21.3, TensorFlow 2.8.2, and Tokenizers 0.12.1.

openai/whisper-base cover image
Replaced
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture. Whisper models are available for various languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, and many more.

openai/whisper-base.en cover image
Replaced
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrated a strong ability to generalise to many datasets and domains without fine-tuning. Whisper checks pens are available in five configurations of varying model sizes, including a smallest configuration trained on English-only data and a largest configuration trained on multilingual data. This one is English-only.

openai/whisper-medium.en cover image
Replaced
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without fine-tuning. The primary intended users of these models are AI researchers studying robustness, generalisation, and capabilities of the current model.

openai/whisper-small.en cover image
Replaced
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labelled data without the need for fine-tuning. It is a Transformer based encoder-decoder model, trained on either English-only or multilingual data, and is available in five configurations of varying model sizes. The models were trained on the tasks of speech recognition and speech translation, predicting transcriptions in the same or different languages as the audio.

openai/whisper-timestamped-medium cover image
Replaced
  • automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

openai/whisper-timestamped-medium.en cover image
Replaced
  • automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This variant contains implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

openai/whisper-tiny.en cover image
Replaced
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data without fine-tuning. It's a Transformer based encoder-decoder model, trained on English-only or multilingual data, predicting transcriptions in the same or different language as the audio. Whisper checkpoints come in five configurations of varying model sizes.

openchat/openchat-3.6-8b cover image
bfloat16
8k
Replaced
  • text-generation

Openchat 3.6 is a LLama-3-8b fine tune that outperforms it on multiple benchmarks.

openchat/openchat_3.5 cover image
fp16
8k
Deprecated
  • text-generation

OpenChat is a library of open-source language models that have been fine-tuned with C-RLFT, a strategy inspired by offline reinforcement learning. These models can learn from mixed-quality data without preference labels and have achieved exceptional performance comparable to ChatGPT. The developers of OpenChat are dedicated to creating a high-performance, commercially viable, open-source large language model and are continuously making progress towards this goal.

runwayml/stable-diffusion-v1-5 cover image
Replaced
  • text-to-image

Most widely used version of Stable Diffusion. Trained on 512x512 images, it can generate realistic images given text description