The CLIP model maps text and images to a shared vector space, enabling various applications such as image search, zero-shot image classification, and image clustering. The model can be used easily after installation, and its performance is demonstrated through zero-shot ImageNet validation set accuracy scores. Multilingual versions of the model are also available for 50+ languages.
The CLIP model maps text and images to a shared vector space, enabling various applications such as image search, zero-shot image classification, and image clustering. The model can be used easily after installation, and its performance is demonstrated through zero-shot ImageNet validation set accuracy scores. Multilingual versions of the model are also available for 50+ languages.
DeepInfra supports the OpenAI embeddings API. The following creates an embedding vector representing the input text
curl "https://api.deepinfra.com/v1/openai/embeddings" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"input": "The food was delicious and the waiter...",
"model": "sentence-transformers/clip-ViT-B-32",
"encoding_format": "float"
}'
which will return something similar to
{
"object":"list",
"data":[
{
"object": "embedding",
"index":0,
"embedding":[
-0.010480394586920738,
-0.0026091758627444506
...
0.031979579478502274,
0.02021978422999382
]
}
],
"model": "sentence-transformers/clip-ViT-B-32",
"usage": {
"prompt_tokens":12,
"total_tokens":12
}
}
service_tier
stringThe service tier used for processing the request. When set to 'priority', the request will be processed with higher priority.
Allowed values: default
priority
dimensions
integerThe number of dimensions in the embedding. If not provided, the model's default will be used.If provided bigger than model's default, the embedding will be padded with zeros.
Range: 32 ≤ dimensions
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.