We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

text-generation

Mixtral-8x7B-Instruct-v0.1

mistralai/Mixtral-8x7B-Instruct-v0.1 cover image

Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.

$0.54 / 1M tokens

text-generation

Llama-3.1-Nemotron-70B-Instruct

nvidia/Llama-3.1-Nemotron-70B-Instruct cover image

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

$1.20 / 1M tokens

text-generation

Llama-3.3-Nemotron-Super-49B-v1.5

nvidia/Llama-3.3-Nemotron-Super-49B-v1.5 cover image

Llama-3.3-Nemotron-Super-49B-v1.5 is a large language model (LLM) optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta's Llama-3.3-70B-Instruct, it employs a Neural Architecture Search (NAS) approach, significantly enhancing efficiency and reducing memory requirements.

$0.10 in, $0.40 out / 1M

text-generation

NVIDIA-Nemotron-Nano-12B-v2-VL

nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL cover image

The model is an auto-regressive vision language model that uses an optimized transformer architecture. The model enables multi-image reasoning and video understanding, along with strong document intelligence, visual Q&A and summarization capabilities.

$0.20 in, $0.60 out / 1M

text-generation

NVIDIA-Nemotron-Nano-9B-v2

nvidia/NVIDIA-Nemotron-Nano-9B-v2 cover image

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.

$0.04 in, $0.16 out / 1M

image-classification

clip-vit-base-patch32

openai/clip-vit-base-patch32 cover image

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

$0.0005 / second

image-classification

clip-vit-large-patch14-336

openai/clip-vit-large-patch14-336 cover image

A zero-shot-image-classification model released by OpenAI. The clip-vit-large-patch14-336 model was trained from scratch on an unknown dataset and achieves unspecified results on the evaluation set. The model's intended uses and limitations, as well as its training and evaluation data, are not provided. The training procedure used an unknown optimizer and precision, and the framework versions included Transformers 4.21.3, TensorFlow 2.8.2, and Tokenizers 0.12.1.

$0.0005 / second

text-generation

gpt-oss-120b-Turbo

openai/gpt-oss-120b-Turbo cover image

$0.15 in, $0.60 out / 1M

speech-recognition

whisper-large-v3

openai/whisper-large-v3 cover image

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

$0.00045 / minute

sentence-transformers/

all-MiniLM-L12-v2

sentence-transformers/all-MiniLM-L12-v2 cover image

We present a sentence transformation model that generates semantically similar sentences. Our model is based on the Sentence-Transformers architecture and was trained on a large dataset of sentence pairs. We evaluate the effectiveness of our model by measuring its ability to generate similar sentences that are close to the original sentence in meaning.

$0.005 / 1M tokens

sentence-transformers/

all-MiniLM-L6-v2

sentence-transformers/all-MiniLM-L6-v2 cover image

We present a sentence transformation model that achieves state-of-the-art results on various NLP tasks without requiring task-specific architectures or fine-tuning. Our approach leverages contrastive learning and utilizes a variety of datasets to learn robust sentence representations. We evaluate our model on several benchmarks and demonstrate its effectiveness in various applications such as text classification, sentiment analysis, named entity recognition, and question answering.

$0.005 / 1M tokens

sentence-transformers/

all-mpnet-base-v2

sentence-transformers/all-mpnet-base-v2 cover image

A sentence transformation model that has been trained on a wide range of datasets, including but not limited to S2ORC, WikiAnwers, PAQ, Stack Exchange, and Yahoo! Answers. Our model can be used for various NLP tasks such as clustering, sentiment analysis, and question answering.

$0.005 / 1M tokens

sentence-transformers/

sentence-transformers/clip-ViT-B-32 cover image

The CLIP model maps text and images to a shared vector space, enabling various applications such as image search, zero-shot image classification, and image clustering. The model can be used easily after installation, and its performance is demonstrated through zero-shot ImageNet validation set accuracy scores. Multilingual versions of the model are also available for 50+ languages.

$0.005 / 1M tokens

sentence-transformers/

clip-ViT-B-32-multilingual-v1

sentence-transformers/clip-ViT-B-32-multilingual-v1 cover image

This model is a multilingual version of the OpenAI CLIP-ViT-B32 model, which maps text and images to a common dense vector space. It includes a text embedding model that works for 50+ languages and an image encoder from CLIP. The model was trained using Multilingual Knowledge Distillation, where a multilingual DistilBERT model was trained as a student model to align the vector space of the original CLIP image encoder across many languages.

$0.005 / 1M tokens

sentence-transformers/

multi-qa-mpnet-base-dot-v1

sentence-transformers/multi-qa-mpnet-base-dot-v1 cover image

We present a sentence transformation model that maps sentences and paragraphs to a 768-dimensional dense vector space, suitable for semantic search tasks. The model is trained on 215 million question-answer pairs from various sources, including WikiAnswers, PAQ, Stack Exchange, MS MARCO, GOOAQ, Amazon QA, Yahoo Answers, Search QA, ELI5, and Natural Questions. Our model uses a contrastive learning objective.

$0.005 / 1M tokens

sentence-transformers/

paraphrase-MiniLM-L6-v2

sentence-transformers/paraphrase-MiniLM-L6-v2 cover image

We present a sentence similarity model based on the Sentence Transformers architecture, which maps sentences to a 384-dimensional dense vector space. The model uses a pre-trained BERT encoder and applies mean pooling on top of the contextualized word embeddings to obtain sentence embeddings. We evaluate the model on the Sentence Embeddings Benchmark.

$0.005 / 1M tokens

text2vec-base-chinese

shibing624/text2vec-base-chinese cover image

A sentence similarity model that can be used for various NLP tasks such as text classification, sentiment analysis, named entity recognition, question answering, and more. It utilizes the CoSENT architecture, which consists of a transformer encoder and a pooling module, to encode input texts into vectors that capture their semantic meaning. The model was trained on the nli_zh dataset and achieved high performance on various benchmark datasets.

$0.005 / 1M tokens

stabilityai/sdxl-turbo cover image

The SDXL Turbo model, developed by Stability AI, is an optimized, fast text-to-image generative model. It is a distilled version of SDXL 1.0, leveraging Adversarial Diffusion Distillation (ADD) to generate high-quality images in less steps.

$0.0002 x (width / 1024) x (height / 1024) x (iters / 5)

thenlper/gte-base cover image

The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including information retrieval, semantic textual similarity, text reranking, etc.

$0.005 / 1M tokens

thenlper/gte-large cover image

The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large, GTE-base, and GTE-small. The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including information retrieval, semantic textual similarity, text reranking, etc.

$0.010 / 1M tokens

text-generation

zai-org/GLM-4.6 cover image

Compared with GLM-4.5, GLM-4.6 brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks. Superior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages. Advanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability. More capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks. Refined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.

$0.11 cached, $0.45 in, $1.90 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V3.1 moonshotai/Kimi-K2-Instruct-0905 deepseek-ai/DeepSeek-V3.2-Exp anthropic/claude-3-7-sonnet-latest zai-org/GLM-4.6

Featured Models

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 MiniMaxAI/MiniMax-M2 deepseek-ai/DeepSeek-R1-0528-Turbo deepseek-ai/DeepSeek-V3.1 deepseek-ai/DeepSeek-V3.2-Exp

Built With Love in Palo Alto

© 2025 Deep Infra. All rights reserved.

Privacy Policy Terms of Service