🚀 New model available: DeepSeek-V3.1 🚀
Run the top AI models using a simple API, pay per use. Low cost, scalable and production ready infrastructure.
On-Demand GPU instances with full SSH access for your AI workloads.
SSH access for full control, available in just 10 seconds
1-Click Deployment, On-demand GPU access
Flexible hourly billing
Rent GPU Instances
OpenAI GPT OSS 120B
Ask me anything
View all models
Sign up for Deep Infra account using github or Login using github
Choose among hundreds of the most popular ML models
Use a simple rest API to call your model.
Model is deployed in multiple regions
Close to the user
Fast network
Autoscaling
Share resources
Pay per use
Simple pricing
No ML Ops needed
Better cost efficiency
Hassle free ML infrastructure
No ML Ops needed
Better cost efficiency
Hassle free ML infrastructure
Fast scaling infrastructure
Maintain low latency
Scale down when not needed
Run costs
Model | Context | $ per 1M input tokens | $ per 1M output tokens |
---|---|---|---|
mixtral-8x7B-chat | 32k | $0.08 | $0.24 |
wizardLM-2-8x22B | 64k | $0.48 | $0.48 |
Llama-3-8B-Instruct | 8k | $0.03 | $0.06 |
Mistral-7B-v3 | 32k | $0.028 | $0.054 |
MythoMax-L2-13b | 4k | $0.072 | $0.072 |
Llama-3-70B-Instruct | 8k | $0.30 | $0.40 |
Llama-3.1-70B-Instruct | 128k | $0.23 | $0.40 |
Llama-3.1-8B-Instruct | 128k | $0.03 | $0.05 |
Model | Context | $ per 1M input tokens | $ per 1M output tokens |
---|---|---|---|
Loading pricing data... |
You can deploy your own model on our hardware and pay for uptime. You get dedicated SXM-connected GPUs (for multi-GPU setups), automatic scaling to handle load fluctuations and a very competitive price. Read More
GPU | Price |
---|---|
Nvidia A100 GPU | $0.89/GPU-hour |
Nvidia H100 GPU | $1.69/GPU-hour |
Nvidia H200 GPU | $1.99/GPU-hour |
Dedicated A100-80GB, H100-80GB & H200-141GB GPUs for your custom LLM needs
Billed in minute granularity
Invoiced weekly
Rent high-performance B200 GPU instances with SSH access for your AI workloads. Manage your instances in the dashboard.
GPU | Price |
---|---|
Nvidia B200 GPU | $2.49/GPU-hour |
SSH access for full control
High-performance B200 GPU instances
Flexible hourly billing
For dedicated instances, DGX H100, and B200 clusters with 3.2Tbps bandwidth, please contact us at dedicated@deepinfra.com
Model | Context | $ per 1M input tokens |
---|---|---|
bge-large-en-v1.5 | 512 | $0.01 |
bge-base-en-v1.5 | 512 | $0.005 |
e5-large-v2 | 512 | $0.01 |
e5-base-v2 | 512 | $0.005 |
gte-large | 512 | $0.01 |
gte-base | 512 | $0.005 |
All models run on H100 or A100 GPUs, optimized for inference performance and low latency.
Our system will automatically scale the model to more hardware based on your needs. We limit each account to 200 concurrent requests. If you want more drop us a line
You have to add a card or pre-pay or you won't be able to use our services. An invoice is always generated at the beginning of the month, and also throughout the month if you hit your tier invoicing threshold. You can also set a spending limit to avoid surprises.
Every user is part of a usage tier. As your usage and your spending goes up, we automatically move you to the next usage tier. Every tier has an invoicing threshold. Once reached an invoice is automatically generated.
Tier | Qualification & Invoicing Threshold | |
---|---|---|
Tier 1 | $20 | |
Tier 2 | $100 paid | $100 |
Tier 3 | $500 paid | $500 |
Tier 4 | $2,000 paid | $2,000 |
Tier 5 | $10,000 paid | $10,000 |
Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.
© 2025 Deep Infra. All rights reserved.