We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

How to deploy Databricks Dolly v2 12b, instruction tuned casual language model.
Published on 2023.04.12 by Yessen Kanapin
How to deploy Databricks Dolly v2 12b, instruction tuned casual language model.

Databricks Dolly is instruction tuned 12 billion parameter casual language model based on EleutherAI's pythia-12b. It was pretrained on The Pile, GPT-J's pretraining corpus. databricks-dolly-15k open source instruction following dataset was used to tune the model.

Getting started

To get started, you'll need an API key from DeepInfra.

  1. Sign up or log in to your DeepInfra account
  2. Navigate to the Dashboard / API Keys section
  3. Create a new API key if you don't have one already

Deployment

You can deploy the databricks/dolly-v2-12b model easily through the web dashboard or by using our API. The model will be automatically deployed when you first run an inference request.

Inference

You can use it with our REST API. Here's how to call the model using curl:

curl -X POST \
    -d '{"prompt": "Who is Elvis Presley?"}' \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer YOUR_API_KEY" \
    'https://api.deepinfra.com/v1/inference/databricks/dolly-v2-12b'
copy

We charge per inference request execution time, $0.0005 per second. Inference runs on Nvidia A100 cards. To see the full documentation of how to call this model, check out the model page on our website.

You can browse all available models on our models page.

If you have any question, just reach out to us on our Discord server.

Related articles
Guaranteed JSON output on Open-Source LLMs.Guaranteed JSON output on Open-Source LLMs.DeepInfra is proud to announce that we have released "JSON mode" across all of our text language models. It is available through the "response_format" object, which currently supports only {"type": "json_object"} Our JSON mode will guarantee that all tokens returned in the output of a langua...
Best API for Kimi K2.5: Why DeepInfra Leads in Speed, TTFT, and ScalabilityBest API for Kimi K2.5: Why DeepInfra Leads in Speed, TTFT, and Scalability<p>Kimi K2.5 is positioned as Moonshot AI’s “do-it-all” model for modern product workflows: native multimodality (text + vision/video), Instant vs. Thinking modes, and support for agentic / multi-agent (“swarm”) execution patterns. In real applications, though, model capability is only half the story. The provider’s inference stack determines the things your users actually feel: time-to-first-token (TTFT), [&hellip;]</p>
Model Distillation Making AI Models EfficientModel Distillation Making AI Models EfficientAI Model Distillation Definition & Methodology Model distillation is the art of teaching a smaller, simpler model to perform as well as a larger one. It's like training an apprentice to take over a master's work—streamlining operations with comparable performance . If you're struggling with depl...