Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/text-generation

Text generation AI models can generate coherent and natural-sounding human language text, making them useful for a variety of applications from language translation to content creation.

There are several types of text generation AI models, including rule-based, statistical, and neural models. Neural models, and in particular transformer-based models like GPT, have achieved state-of-the-art results in text generation tasks. These models use artificial neural networks to analyze large text corpora and learn the patterns and structures of language.

While text generation AI models offer many exciting possibilities, they also present some challenges. For example, it's essential to ensure that the generated text is ethical, unbiased, and accurate, to avoid potential harm or negative consequences.

Austism/chronos-hermes-13b-v2 cover image
fp16
4k
Replaced
  • text-generation

This offers the imaginative writing style of chronos while still retaining coherency and being capable. Outputs are long and utilize exceptional prose. Supports a maxium context length of 4096. The model follows the Alpaca prompt format.

Gryphe/MythoMax-L2-13b-turbo cover image
fp8
4k
Replaced
  • text-generation

Faster version of Gryphe/MythoMax-L2-13b running on multiple H100 cards in fp8 precision. Up to 160 tps.

HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 cover image
fp8
64k
Replaced
  • text-generation

Zephyr 141B-A35B is an instruction-tuned (assistant) version of Mixtral-8x22B. It was fine-tuned on a mix of publicly available, synthetic datasets. It achieves strong performance on chat benchmarks.

KoboldAI/LLaMA2-13B-Tiefighter cover image
fp16
4k
Replaced
  • text-generation

LLaMA2-13B-Tiefighter is a highly creative and versatile language model, fine-tuned for storytelling, adventure, and conversational dialogue. It combines the strengths of multiple models and datasets, including retro-rodeo and choose-your-own-adventure, to generate engaging and imaginative content. With its ability to improvise and adapt to different styles and formats, Tiefighter is perfect for writers, creators, and anyone looking to spark their imagination.

NousResearch/Hermes-3-Llama-3.1-405B cover image
fp8
128k
$0.80 / Mtoken
  • text-generation

Hermes 3 is a cutting-edge language model that offers advanced capabilities in roleplaying, reasoning, and conversation. It's a fine-tuned version of the Llama-3.1 405B foundation model, designed to align with user needs and provide powerful control. Key features include reliable function calling, structured output, generalist assistant capabilities, and improved code generation. Hermes 3 is competitive with Llama-3.1 Instruct models, with its own strengths and weaknesses.

NovaSky-AI/Sky-T1-32B-Preview cover image
fp16
32k
$0.12/$0.18 in/out Mtoken
  • text-generation

This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.

Phind/Phind-CodeLlama-34B-v2 cover image
fp16
4k
Replaced
  • text-generation

Phind-CodeLlama-34B-v2 is an open-source language model that has been fine-tuned on 1.5B tokens of high-quality programming-related data and achieved a pass@1 rate of 73.8% on HumanEval. It is multi-lingual and proficient in Python, C/C++, TypeScript, Java, and more. It has been trained on a proprietary dataset of instruction-answer pairs instead of code completion examples. The model is instruction-tuned on the Alpaca/Vicuna format to be steerable and easy-to-use. It accepts the Alpaca/Vicuna instruction format and can generate one completion for each prompt.

Qwen/QVQ-72B-Preview cover image
bfloat16
125k
$0.25/$0.50 in/out Mtoken
  • text-generation

QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark

Qwen/Qwen2-72B-Instruct cover image
bfloat16
32k
Replaced
  • text-generation

The 72 billion parameter Qwen2 excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

Qwen/Qwen2-7B-Instruct cover image
bfloat16
32k
Replaced
  • text-generation

The 7 billion parameter Qwen2 excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning.

Qwen/Qwen2.5-7B-Instruct cover image
bfloat16
32k
$0.025/$0.05 in/out Mtoken
  • text-generation

The 7 billion parameter Qwen2.5 excels in language understanding, multilingual capabilities, coding, mathematics, and reasoning

Qwen/Qwen2.5-Coder-7B cover image
32k
Replaced
  • text-generation

Qwen2.5-Coder-7B is a powerful code-specific large language model with 7.61 billion parameters. It's designed for code generation, reasoning, and fixing tasks. The model covers 92 programming languages and has been trained on 5.5 trillion tokens of data, including source code, text-code grounding, and synthetic data.

Sao10K/L3-70B-Euryale-v2.1 cover image
fp8
8k
$0.70/$0.80 in/out Mtoken
  • text-generation

Euryale 70B v2.1 is a model focused on creative roleplay from Sao10k

Sao10K/L3-8B-Lunaris-v1 cover image
bfloat16
8k
Deprecated
  • text-generation

A generalist / roleplaying model merge based on Llama 3. Sao10K has carefully selected the values based on extensive personal experimentation and has fine-tuned them to create a customized recipe.

Sao10K/L3.1-70B-Euryale-v2.2 cover image
fp8
128k
$0.70/$0.80 in/out Mtoken
  • text-generation

Euryale 3.1 - 70B v2.2 is a model focused on creative roleplay from Sao10k

Sao10K/L3.3-70B-Euryale-v2.3 cover image
fp8
128k
$0.70/$0.80 in/out Mtoken
  • text-generation

L3.3-70B-Euryale-v2.3 is a model focused on creative roleplay from Sao10k