FLUX.2 is live! High-fidelity image generation made simple.

# create a virtual environment
python3 -m venv .venv
# activate environment in current shell
. .venv/bin/activate
# install openai python client
pip install openai
import openai
stream = True # or False
# Point OpenAI client to our endpoint
openai.api_key = "<YOUR DEEPINFRA API KEY>"
openai.api_base = "https://api.deepinfra.com/v1/openai"
# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
model=MODEL_DI,
messages=[{"role": "user", "content": "Hello world"}],
stream=stream,
max_tokens=100,
# top_p=0.5,
)
if stream:
# print the chat completion
for event in chat_completion:
print(event.choices)
else:
print(chat_completion.choices[0].message.content)
Note that both streaming and batch mode are supported.
If you're already using OpenAI chat completion in your project, you need to
change the api_key, api_base and model params:
import openai
# set these before running any completions
openai.api_key = "YOUR DEEPINFRA TOKEN"
openai.api_base = "https://api.deepinfra.com/v1/openai"
openai.ChatCompletion.create(
model="CHOSEN MODEL HERE",
# ...
)
Our OpenAI API compatible models are priced on token output (just like OpenAI). Our current price is $1 / 1M tokens.
Check the docs for more in-depth information and examples openai api.
Art That Talks Back: A Hands-On Tutorial on Talking ImagesTurn any image into a talking masterpiece with this step-by-step guide using DeepInfra’s GenAI models.
Deploy Custom LLMs on DeepInfraDid you just finetune your favorite model and are wondering where to run it?
Well, we have you covered. Simple API and predictable pricing.
Put your model on huggingface
Use a private repo, if you wish, we don't mind. Create a hf access token just
for the repo for better security.
Create c...
Langchain improvements: async and streamingStarting from langchain
v0.0.322 you
can make efficient async generation and streaming tokens with deepinfra.
Async generation
The deepinfra wrapper now supports native async calls, so you can expect more
performance (no more t...© 2026 Deep Infra. All rights reserved.