FLUX.2 is live! High-fidelity image generation made simple.

To use DeepInfra's API, you'll need an API key.
You'll use this API key in your requests to authenticate with our services.
Whisper is a Speech-To-Text model from OpenAI. Given an audio file with voice data it produces human speech recognition text with per sentence timestamps. There are different model sizes (small, base, large, etc.) and variants for English, see more at deepinfra.com. By default, Whisper produces by sentence timestamp segmentation. We also host whisper-timestamped that can provide timestamps for words in the audio. You can use it with our REST API. Here's how to use it:
curl -X POST \
-F "audio=@/home/user/all-in-01.mp3" \
-H "Authorization: Bearer YOUR_API_KEY" \
'https://api.deepinfra.com/v1/inference/openai/whisper-timestamped-medium.en'
To see additional parameters and how to call this model, check out the documentation page for complete API reference and examples.
If you have any question, just reach out to us on our Discord server.
Search That Actually Works: A Guide to LLM RerankersSearch relevance isn’t a nice-to-have feature for your site or app. It can make or break the entire user experience.
When a customer searches "best laptop for video editing" and gets results for gaming laptops or budget models, they leave empty-handed.
Embeddings help you find similar content, bu...
Unleashing the Potential of AI for Exceptional Gaming ExperiencesGaming companies are constantly in search of ways to enhance player experiences and achieve
extraordinary outcomes. Recent research indicates that investments in player experience (PX)
can result in substantial returns on investment (ROI). By prioritizing PX and harnessing
the capabilities of AI...
GLM-4.6 API: Get fast first tokens at the best $/M from Deepinfra's API - Deep Infra<p>GLM-4.6 is a high-capacity, “reasoning”-tuned model that shows up in coding copilots, long-context RAG, and multi-tool agent loops. With this class of workload, provider infrastructure determines perceived speed (first-token time), tail stability, and your unit economics. Using ArtificialAnalysis (AA) provider charts for GLM-4.6 (Reasoning), DeepInfra (FP8) pairs a sub-second Time-to-First-Token (TTFT) (0.51 s) with the […]</p>
© 2025 Deep Infra. All rights reserved.