DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

To use DeepInfra's API, you'll need an API key.
You'll use this API key in your requests to authenticate with our services.
Whisper is a Speech-To-Text model from OpenAI. Given an audio file with voice data it produces human speech recognition text with per sentence timestamps. There are different model sizes (small, base, large, etc.) and variants for English, see more at deepinfra.com. By default, Whisper produces by sentence timestamp segmentation. We also host whisper-timestamped that can provide timestamps for words in the audio. You can use it with our REST API. Here's how to use it:
curl -X POST \
-F "audio=@/home/user/all-in-01.mp3" \
-H "Authorization: Bearer YOUR_API_KEY" \
'https://api.deepinfra.com/v1/inference/openai/whisper-timestamped-medium.en'
To see additional parameters and how to call this model, check out the documentation page for complete API reference and examples.
If you have any question, just reach out to us on our Discord server.
GLM-4.6 API: Get fast first tokens at the best $/M from Deepinfra's API - Deep Infra<p>GLM-4.6 is a high-capacity, “reasoning”-tuned model that shows up in coding copilots, long-context RAG, and multi-tool agent loops. With this class of workload, provider infrastructure determines perceived speed (first-token time), tail stability, and your unit economics. Using ArtificialAnalysis (AA) provider charts for GLM-4.6 (Reasoning), DeepInfra (FP8) pairs a sub-second Time-to-First-Token (TTFT) (0.51 s) with the […]</p>
DeepSeek V4 Pro: Model Overview, Features & Performance Guide<p>DeepSeek V4 Pro is a 1.6-trillion parameter Mixture-of-Experts (MoE) model from DeepSeek, released on April 24, 2026 under the MIT license. It is designed for advanced reasoning, complex software engineering, and long-running agentic tasks, and arrives alongside DeepSeek-V4-Flash, a lighter 284B-parameter variant built for faster, lower-cost inference. The V4 series is DeepSeek’s first two-tier lineup […]</p>
OpenCode: Open-Source Claude Code Alternative<p>Open your cloud bill after a month of heavy agent use and the number stops being abstract. Teams report coding-assistant costs in the hundreds of dollars per developer, and some now set token budgets the way they once rationed cloud compute. Then in June 2026 the US government barred non-Americans from Anthropic’s Fable 5, and […]</p>
© 2026 DeepInfra. All rights reserved.