FLUX.2 is live! High-fidelity image generation made simple.

To use DeepInfra's services, you'll need an API key. You can get one by signing up on our platform.
Your API key will be used to authenticate all your requests to the DeepInfra API.
Now lets actually deploy some models to production and use them for inference. It is really easy.
You can deploy models through the web dashboard or by using our API. Models are automatically deployed when you first make an inference request.
Once a model is deployed on DeepInfra, you can use it with our REST API. Here's how to use it with curl:
curl -X POST \
-F "audio=@/path/to/audio.mp3" \
-H "Authorization: Bearer YOUR_API_KEY" \
'https://api.deepinfra.com/v1/inference/openai/whisper-small'
Deep Infra Launches Access to NVIDIA Nemotron Models for Vision, Retrieval, and AI SafetyDeep Infra is serving the new, open NVIDIA Nemotron vision language and OCR AI models from day zero of their release. As a leading inference provider committed to performance and cost-efficiency, we're making these cutting-edge models available at the industry's best prices, empowering developers to build specialized AI agents without compromising on budget or performance.
How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfraWhisper is a Speech-To-Text model from OpenAI.
Introducing GPU Instances: On-Demand GPU Compute for AI WorkloadsLaunch dedicated GPU containers in minutes with our new GPU Instances feature, designed for machine learning training, inference, and compute-intensive workloads.© 2026 Deep Infra. All rights reserved.