Long Context models incoming

We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

Published on 2023.11.21 by Iskren Chernev

Many users requested longer context models to help them summarize bigger chunks of text or write novels with ease.

We're proud to announce our long context model selection that will grow bigger in the comming weeks.

Models

Mistral-based models have a context size of 32k, and amazon recently released a model fine-tuned specifically on longer contexts.

Mistral-7B -- 32k context
MistralLite -- 32k context, fine-tuned for longer context

We also recently released the highly praised Yi models. Keep in mind they don't support chat, just the old-school text completion (new models are in the works):

Yi-6B-200K -- 200K context
Yi-34B-200K -- 200K context

Context FAQ

What does context mean? - Context is the number of tokens the model can look at at the same time. In practise this is the limit of the sent tokens + max new tokens (2k default) that can be generated. There is no 1:1 correspondence between tokens and words, general rule-of-thumb is 100 tokens is 75 words.
Can I go above the context? - Technically -- no. Any API to a 4k model that lets you pass more will one way or another truncate the tokens given to the LLM to it's context size. You (as users) can also try reducing a long input (by removing some from the middle, for example) and re-submit.
My input fits the context size, but the model doesn't take it into account - The context size of a model is a hard creation-specified limit. Just because a model is listed as having a certain context size doesn't mean you can cram it with information and expect excellent results. Models differ in many parameters, and one such parameter is how far back they can recall information.Check MistalLite HF for some metrics shared by Amazon.
What can I do to make the model take into account more of the data I sent - There is no one answer to this question, and it varies by model, and each model capacity varies. The best you can do is test a particular model with a single task (like summarization, question answering), with different context lengths and find what works for you. Also try placing system prompt at the start and/or end. For most control you can utilize the text-completion endpoint (not the chat one).
Is a longer context model guaranteed to understand more context than a short-context model - Unfortunately -- no. For example a very bad model with huge context size may fail to act on a single sentence, whereas a good model with shorter context length could comprehend a few paragraphs or even pages.
I can't find a good model for my large context needs! - Don't loose hope! This is a rapidly evolving ecosystem with models being released every day. We try our best to provide the best open-source models to you, as quickly as possible. Let us know which models you like by emailing feedback@deepinfra.com or drop us a message in discord

Deep Infra Launches Access to NVIDIA Nemotron Models for Vision, Retrieval, and AI SafetyDeep Infra is serving the new, open NVIDIA Nemotron vision language and OCR AI models from day zero of their release. As a leading inference provider committed to performance and cost-efficiency, we're making these cutting-edge models available at the industry's best prices, empowering developers to build specialized AI agents without compromising on budget or performance.

From Precision to Quantization: A Practical Guide to Faster, Cheaper LLMs<p>Large language models live and die by numbers—literally trillions of them. How finely we store those numbers (their precision) determines how much memory a model needs, how fast it runs, and sometimes how good its answers are. This article walks from the basics to the deep end: we’ll start with how computers even store a […]</p>

How to deploy google/flan-ul2 - simple. (open source ChatGPT alternative)Flan-UL2 is probably the best open source model available right now for chatbots. In this post we will show you how to get started with it very easily. Flan-UL2 is large - 20B parameters. It is fine tuned version of the UL2 model using Flan dataset. Because this is quite a large model it is not eas...

View all