We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

How to deploy google/flan-ul2 - simple. (open source ChatGPT alternative)
Published on 2023.03.17 by Nikola Borisov
How to deploy google/flan-ul2 - simple. (open source ChatGPT alternative)

Flan-UL2 is probably the best open source model available right now for chatbots. In this post we will show you how to get started with it very easily. Flan-UL2 is large - 20B parameters. It is fine tuned version of the UL2 model using Flan dataset. Because this is quite a large model it is not easy to deploy it on your own machine. If you rent a GPU in AWS, it will cost you around $1.5 per hour or $1080 per month. Using DeepInfra model deployments you only pay for the inference time, and we do not charge for cold starts. Our pricing is $0.0005 per second of running inference on Nvidia A100. Which translates to about $0.0001 per token generated by Flan-UL2.

Also check out the model page https://deepinfra.com/google/flan-ul2. You can run inferences, check the docs/API for running inferences via curl.

Getting started

First, you'll need to get an API key from the DeepInfra dashboard.

  1. Sign up or log in to your DeepInfra account
  2. Navigate to the API Keys section in the dashboard
  3. Create a new API key for authentication

Deployment

You can deploy the google/flan-ul2 model easily through the web dashboard or API. The model will be automatically deployed when you first make an inference request.

Inference

You can use it with our REST API. Here's how to use it with curl:

curl -X POST \
    -d '{"prompt": "Hello, how are you?"}' \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer YOUR_API_KEY" \
    'https://api.deepinfra.com/v1/inference/google/flan-ul2'
copy

To see the full documentation of how to call this model, check out the model page on the DeepInfra website or the API documentation.

If you want a list of all the models you can use on DeepInfra, you can visit the models page on our website or use the API to get a list of available models.

There is no easier way to get started with arguably one of the best open source LLM. This was quite easy right? You did not have to deal with docker, transformers, pytorch, etc. If you have any question, just reach out to us on our Discord server.

Related articles
Juggernaut FLUX is live on DeepInfra!Juggernaut FLUX is live on DeepInfra!Juggernaut FLUX is live on DeepInfra! At DeepInfra, we care about one thing above all: making cutting-edge AI models accessible. Today, we're excited to release the most downloaded model to our platform. Whether you're a visual artist, developer, or building an app that relies on high-fidelity ...
Reliable JSON-Only Responses with DeepInfra LLMsReliable JSON-Only Responses with DeepInfra LLMs<p>When large language models are used inside real applications, their role changes fundamentally. Instead of chatting with users, they become infrastructure components: extracting information, transforming text, driving workflows, or powering APIs. In these scenarios, natural language is no longer the desired output. What applications need is structured data — and very often, that structure is [&hellip;]</p>
DeepInfra Launches Access to NVIDIA Cosmos 3 World Foundation Models for Physical AIDeepInfra Launches Access to NVIDIA Cosmos 3 World Foundation Models for Physical AIDeepInfra is serving NVIDIA Cosmos 3, the first open world foundation model for physical AI that reasons before it generates, from day zero of its release. Available as two variants—Cosmos 3 Nano and Cosmos 3 Super—these models give developers a cost-efficient foundation for building robots, autonomous vehicles, simulation workflows, and synthetic data generation at scale.