Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

# create a virtual environment
python3 -m venv .venv
# activate environment in current shell
. .venv/bin/activate
# install openai python client
pip install openai
import openai
stream = True # or False
# Point OpenAI client to our endpoint
openai.api_key = "<YOUR DEEPINFRA API KEY>"
openai.api_base = "https://api.deepinfra.com/v1/openai"
# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
model=MODEL_DI,
messages=[{"role": "user", "content": "Hello world"}],
stream=stream,
max_tokens=100,
# top_p=0.5,
)
if stream:
# print the chat completion
for event in chat_completion:
print(event.choices)
else:
print(chat_completion.choices[0].message.content)
Note that both streaming and batch mode are supported.
If you're already using OpenAI chat completion in your project, you need to
change the api_key, api_base and model params:
import openai
# set these before running any completions
openai.api_key = "YOUR DEEPINFRA TOKEN"
openai.api_base = "https://api.deepinfra.com/v1/openai"
openai.ChatCompletion.create(
model="CHOSEN MODEL HERE",
# ...
)
Our OpenAI API compatible models are priced on token output (just like OpenAI). Our current price is $1 / 1M tokens.
Check the docs for more in-depth information and examples openai api.
Building Efficient AI Inference on NVIDIA Blackwell PlatformDeepInfra delivers up to 20x cost reductions on NVIDIA Blackwell by combining MoE architectures, NVFP4 quantization, and inference optimizations — with a Latitude case study.
Qwen API Pricing Guide 2026: Max Performance on a Budget<p>If you have been following the AI leaderboards lately, you have likely noticed a new name constantly trading blows with GPT-4o and Claude 3.5 Sonnet: Qwen. Developed by Alibaba Cloud, the Qwen model family (specifically Qwen 2.5 and Qwen 3) has exploded in popularity for one simple reason: unbeatable price-to-performance. In 2025, Qwen is widely […]</p>
How to deploy google/flan-ul2 - simple. (open source ChatGPT alternative)Flan-UL2 is probably the best open source model available right now for chatbots. In this post
we will show you how to get started with it very easily. Flan-UL2 is large -
20B parameters. It is fine tuned version of the UL2 model using Flan dataset.
Because this is quite a large model it is not eas...© 2026 Deep Infra. All rights reserved.