NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!
nvidia/
$0.20
in
$0.60
out
/ 1M tokens
NVIDIA Nemotron 2 Nano VL extends the Nemotron family into multi-modal reasoning and document intelligence. This auto-regressive vision-language model enables multi-image reasoning, video understanding, visual Q&A and document analysis and summarization. Optimized for enterprise AI workflows, it powers multimodal agentic systems such as visual copilots, document assistants, and knowledge automation pipelines.

You can POST to our OpenAI Chat Completions compatible endpoint.
Passing a url to an image is the easiest way to perform OCR.
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL",
"max_tokens": 4092,
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://url.com/to/shakespeare.png"
}
}
]
}
]
}'
Another options is to read the image from a file
BASE64_IMAGE=$(base64 -w 0 shakespeare.png)
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d @- <<EOF
{
"model": "nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL",
"max_tokens": 4092,
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,$BASE64_IMAGE"
}
}
]
}
]
}
EOF
© 2026 Deep Infra. All rights reserved.