We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

meta-llama logo

meta-llama/

Llama-3.2-11B-Vision-Instruct

$0.049

/ 1M tokens

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

Deploy Private Endpoint
Public
fp8
131,072
JSON
Multimodal
ProjectLicenseLlama
meta-llama/Llama-3.2-11B-Vision-Instruct cover image

OpenAI-compatible HTTP API

You can POST to our OpenAI Chat Completions compatible endpoint.

Passing a url to an image is the easiest way to perform OCR.

curl "https://api.deepinfra.com/v1/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
  -d '{
      "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
      "max_tokens": 4092,
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "type": "image_url",
              "image_url": {
                "url": "https://url.com/to/shakespeare.png"
              }
            }
          ]
        }
      ]
    }'
copy

Another options is to read the image from a file


BASE64_IMAGE=$(base64 -w 0 shakespeare.png)

curl "https://api.deepinfra.com/v1/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
  -d @- <<EOF
{
  "model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
  "max_tokens": 4092,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,$BASE64_IMAGE"
          }
        }
      ]
    }
  ]
}
EOF

copy

Input fields

Input Schema

Output Schema

Streaming Schema