The Multilingual-E5 models, initialized from XLM-RoBERTa, support up to 512 tokens per input — any longer text will be silently truncated. To ensure optimal performance, always prefix inputs with “query:” or “passage:”, as the model was explicitly trained with this format.
The Multilingual-E5 models, initialized from XLM-RoBERTa, support up to 512 tokens per input — any longer text will be silently truncated. To ensure optimal performance, always prefix inputs with “query:” or “passage:”, as the model was explicitly trained with this format.
DeepInfra supports the OpenAI embeddings API. The following creates an embedding vector representing the input text
curl "https://api.deepinfra.com/v1/openai/embeddings" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"input": "The food was delicious and the waiter...",
"model": "intfloat/multilingual-e5-large-instruct",
"encoding_format": "float"
}'
which will return something similar to
{
"object":"list",
"data":[
{
"object": "embedding",
"index":0,
"embedding":[
-0.010480394586920738,
-0.0026091758627444506
...
0.031979579478502274,
0.02021978422999382
]
}
],
"model": "intfloat/multilingual-e5-large-instruct",
"usage": {
"prompt_tokens":12,
"total_tokens":12
}
}
service_tier
stringThe service tier used for processing the request. When set to 'priority', the request will be processed with higher priority.
Allowed values: default
priority
dimensions
integerThe number of dimensions in the embedding. If not provided, the model's default will be used.If provided bigger than model's default, the embedding will be padded with zeros.
Range: 32 ≤ dimensions
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.