The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B)
The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B)
You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"queries": ["What is the capital of United States of America?"], "documents": ["The capital of USA is Washington DC."]}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/Qwen/Qwen3-Reranker-4B'
which will give you back something similar to:
{
"scores": [
0.1,
0.2,
0.3
],
"input_tokens": 42,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
instruction
stringInstruction for the reranker model. It will be used to provide additional context or guidance for the reranking task.
Default value: "Given a web search query, retrieve relevant passages that answer the query"
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request