We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!

Use OpenAI API clients with LLaMas
Published on 2023.08.28 by Iskren Chernev
Use OpenAI API clients with LLaMas

Getting started

# create a virtual environment
python3 -m venv .venv
# activate environment in current shell
. .venv/bin/activate
# install openai python client
pip install openai
copy

Choose a model

Run OpenAI chat.completion

import openai

stream = True # or False

# Point OpenAI client to our endpoint
openai.api_key = "<YOUR DEEPINFRA API KEY>"
openai.api_base = "https://api.deepinfra.com/v1/openai"

# Your chosen model here
MODEL_DI = "meta-llama/Llama-2-70b-chat-hf"
chat_completion = openai.ChatCompletion.create(
    model=MODEL_DI,
    messages=[{"role": "user", "content": "Hello world"}],
    stream=stream,
    max_tokens=100,
    # top_p=0.5,
)

if stream:
    # print the chat completion
    for event in chat_completion:
        print(event.choices)
else:
    print(chat_completion.choices[0].message.content)
copy

Note that both streaming and batch mode are supported.

Existing OpenAI integration

If you're already using OpenAI chat completion in your project, you need to change the api_key, api_base and model params:

import openai

# set these before running any completions
openai.api_key = "YOUR DEEPINFRA TOKEN"
openai.api_base = "https://api.deepinfra.com/v1/openai"

openai.ChatCompletion.create(
    model="CHOSEN MODEL HERE",
    # ...
)
copy

Pricing

Our OpenAI API compatible models are priced on token output (just like OpenAI). Our current price is $1 / 1M tokens.

Docs

Check the docs for more in-depth information and examples openai api.

Related articles
Introducing Nemotron 3 Super on DeepInfraIntroducing Nemotron 3 Super on DeepInfraDeepInfra is an official launch partner for NVIDIA Nemotron 3 Super, the latest open model in the Nemotron family, purpose-built for complex multi-agent applications with a 1M token context window and hybrid MoE architecture.
Accelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfraAccelerating Reasoning Workflows with Nemotron 3 Nano on DeepInfraDeepInfra is an official launch partner for NVIDIA Nemotron 3 Nano, the newest open reasoning model in the Nemotron family. Our goal is to give developers, researchers, and teams the fastest and simplest path to using Nemotron 3 Nano from day one.
How to use CivitAI LoRAs: 5-Minute AI Guide to Stunning Double Exposure ArtHow to use CivitAI LoRAs: 5-Minute AI Guide to Stunning Double Exposure ArtLearn how to create mesmerizing double exposure art in minutes using AI. This guide shows you how to set up a LoRA model from CivitAI and create stunning artistic compositions that blend multiple images into dreamlike masterpieces.