We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New model available: DeepSeek-V3.1 🚀

Logo

Fast ML Inference, Simple API

Run the top AI models using a simple API, pay per use. Low cost, scalable and production ready infrastructure.

SOC 2 CertifiedISO 27001 Certified

B200 for $2.49

On-Demand GPU instances with full SSH access for your AI workloads.

SSH access for full control, available in just 10 seconds

1-Click Deployment, On-demand GPU access

Flexible hourly billing

Rent GPU Instances

Deep Chat

openai/gpt-oss-120b cover image

OpenAI GPT OSS 120B

Ask me anything

0.00s

Featured models:

What we loved, used and implemented the most last month:

View all models

How to deploy Deep Infra in seconds

Powerful, self-serve machine learning platform where you can turn models into scalable APIs in just a few clicks.
Download
Use our API

Sign up for Deep Infra account using github or Login using github

Deploy
Deploy a model

Choose among hundreds of the most popular ML models

Production
Call Your Model in Production

Use a simple rest API to call your model.

Rocket

Deepinfra Benefits

Deploy models to production faster and cheaper with our serverless GPUs than developing the infrastructure yourself.
Low Latency
Low Latency
  • Model is deployed in multiple regions

  • Close to the user

  • Fast network

  • Autoscaling

Cost Effective
Cost Effective
  • Share resources

  • Pay per use

  • Simple pricing

Serverless
Serverless
  • No ML Ops needed

  • Better cost efficiency

  • Hassle free ML infrastructure

Simple
Simple
  • No ML Ops needed

  • Better cost efficiency

  • Hassle free ML infrastructure

Auto Scaling
Auto Scaling
  • Fast scaling infrastructure

  • Maintain low latency

  • Scale down when not needed

Run costs

Simple Pricing, Deep Infrastructure

We have different pricing models depending on the model used. Some of our langauge models offer per token pricing. Most other models are billed for inference execution time. With this pricing model, you only pay for what you use. There are no long-term contracts or upfront costs, and you can easily scale up and down as your business needs change.

Token Pricing

$0.8 / 1M input tokens
Llama-3.1-405B-Instruct

ModelContext$ per 1M input tokens$ per 1M output tokens
mixtral-8x7B-chat32k$0.08$0.24
wizardLM-2-8x22B64k$0.48$0.48
Llama-3-8B-Instruct8k$0.03$0.06
Mistral-7B-v332k$0.028$0.054
MythoMax-L2-13b4k$0.072$0.072
Llama-3-70B-Instruct8k$0.30$0.40
Llama-3.1-70B-Instruct128k$0.23$0.40
Llama-3.1-8B-Instruct128k$0.03$0.05
LoRA-tuned Llama Models

ModelContext$ per 1M input tokens$ per 1M output tokens
Loading pricing data...

You can deploy your own model on our hardware and pay for uptime. You get dedicated SXM-connected GPUs (for multi-GPU setups), automatic scaling to handle load fluctuations and a very competitive price. Read More

GPUPrice
Nvidia A100 GPU$0.89/GPU-hour
Nvidia H100 GPU$1.69/GPU-hour
Nvidia H200 GPU$1.99/GPU-hour
Deploy
  • Dedicated A100-80GB, H100-80GB & H200-141GB GPUs for your custom LLM needs

  • Billed in minute granularity

  • Invoiced weekly

Rent high-performance B200 GPU instances with SSH access for your AI workloads. Manage your instances in the dashboard.

GPUPrice
Nvidia B200 GPU$2.49/GPU-hour
Rent GPU Instances
  • SSH access for full control

  • High-performance B200 GPU instances

  • Flexible hourly billing

Dedicated Instances and Clusters

For dedicated instances, DGX H100, and B200 clusters with 3.2Tbps bandwidth, please contact us at dedicated@deepinfra.com


ModelContext$ per 1M input tokens
bge-large-en-v1.5512$0.01
bge-base-en-v1.5512$0.005
e5-large-v2512$0.01
e5-base-v2512$0.005
gte-large512$0.01
gte-base512$0.005
Hardware
Hardware

All models run on H100 or A100 GPUs, optimized for inference performance and low latency.

Auto scaling
Auto Scaling

Our system will automatically scale the model to more hardware based on your needs. We limit each account to 200 concurrent requests. If you want more drop us a line

Billing
Billing

You have to add a card or pre-pay or you won't be able to use our services. An invoice is always generated at the beginning of the month, and also throughout the month if you hit your tier invoicing threshold. You can also set a spending limit to avoid surprises.

Usage Tiers

Every user is part of a usage tier. As your usage and your spending goes up, we automatically move you to the next usage tier. Every tier has an invoicing threshold. Once reached an invoice is automatically generated.

TierQualification & Invoicing Threshold
Tier 1$20
Tier 2$100 paid$100
Tier 3$500 paid$500
Tier 4$2,000 paid$2,000
Tier 5$10,000 paid$10,000

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.