Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

meta-llama/Llama-2-7b-chat-hf cover image
fp16
4k
Replaced
  • text-generation

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

meta-llama/Llama-3.2-1B-Instruct cover image
bfloat16
128k
$0.01 / Mtoken
  • text-generation

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out).

meta-llama/Llama-3.2-3B-Instruct cover image
bfloat16
128k
$0.015/$0.025 in/out Mtoken
  • text-generation

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out)

meta-llama/Meta-Llama-3-70B-Instruct cover image
bfloat16
8k
$0.23/$0.40 in/out Mtoken
  • text-generation

Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.

meta-llama/Meta-Llama-3-8B-Instruct cover image
bfloat16
8k
$0.03/$0.06 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.

microsoft/Phi-3-medium-4k-instruct cover image
bfloat16
4k
Replaced
  • text-generation

The Phi-3-Medium-4K-Instruct is a powerful and lightweight language model with 14 billion parameters, trained on high-quality data to excel in instruction following and safety measures. It demonstrates exceptional performance across benchmarks, including common sense, language understanding, and logical reasoning, outperforming models of similar size.

microsoft/WizardLM-2-7B cover image
fp16
32k
Replaced
  • text-generation

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger open-source leading models

mistralai/Mistral-7B-Instruct-v0.1 cover image
fp16
32k
Replaced
  • text-generation

The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

mistralai/Mistral-7B-Instruct-v0.2 cover image
fp16
32k
Replaced
  • text-generation

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.2 generative text model using a variety of publicly available conversation datasets.

mistralai/Mistral-7B-Instruct-v0.3 cover image
bfloat16
32k
$0.03/$0.055 in/out Mtoken
  • text-generation

Mistral-7B-Instruct-v0.3 is an instruction-tuned model, next iteration of of Mistral 7B that has larger vocabulary, newer tokenizer and supports function calling.

mistralai/Mistral-Nemo-Instruct-2407 cover image
bfloat16
128k
$0.035/$0.08 in/out Mtoken
  • text-generation

12B model trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

mistralai/Mixtral-8x22B-Instruct-v0.1 cover image
bfloat16
64k
Replaced
  • text-generation

This is the instruction fine-tuned version of Mixtral-8x22B - the latest and largest mixture of experts large language model (LLM) from Mistral AI. This state of the art machine learning model uses a mixture 8 of experts (MoE) 22b models. During inference 2 experts are selected. This architecture allows large models to be fast and cheap at inference.

mistralai/Mixtral-8x22B-v0.1 cover image
fp16
64k
Replaced
  • text-generation

Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. This model is not instruction tuned.

mistralai/Mixtral-8x7B-Instruct-v0.1 cover image
fp8
32k
$0.24 / Mtoken
  • text-generation

Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.

nvidia/Nemotron-4-340B-Instruct cover image
bfloat16
4k
Replaced
  • text-generation

Nemotron-4-340B-Instruct is a chat model intended for use for the English language, designed for Synthetic Data Generation

openai/clip-vit-base-patch32 cover image
$0.0005 / sec
  • zero-shot-image-classification

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

openai/clip-vit-large-patch14-336 cover image
$0.0005 / sec
  • zero-shot-image-classification

A zero-shot-image-classification model released by OpenAI. The clip-vit-large-patch14-336 model was trained from scratch on an unknown dataset and achieves unspecified results on the evaluation set. The model's intended uses and limitations, as well as its training and evaluation data, are not provided. The training procedure used an unknown optimizer and precision, and the framework versions included Transformers 4.21.3, TensorFlow 2.8.2, and Tokenizers 0.12.1.

openai/whisper-base cover image
Replaced
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture. Whisper models are available for various languages including English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, and many more.