Lzlv model for roleplaying and creative work

We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

Published on 2023.11.02 by Nikola Borisov

Recently an interesting new model got released. It is called Lzlv, and it is basically a merge of few existing models. This model is using the Vicuna prompt format, so keep this in mind if you are using our raw API. If you are using the OpenAI API we will take care of the prompt formatting for you.

The Lzlv model also got lots of attention on Reddit, and it was tagged as top choice for the 70b category here: https://www.reddit.com/r/LocalLLaMA/comments/17fhp9k/huge_llm_comparisontest_39_models_tested_7b70b/ It seems to have the right balance between creativity and coherence and is a good choice for roleplay use-cases. You can quickly try it out using our web chat ui. If you like the results you can easily integrate it into your application using our APIs.

How to use it

As always you can easily access the model using our LLM APIs. For example using our OpenAI compatible API:

import openai

# Point OpenAI client to our endpoint, get api key: https://deepinfra.com/dash/api_keys
openai.api_key = "<YOUR DEEPINFRA TOKEN>"
openai.api_base = "https://api.deepinfra.com/v1/openai"

# Your chosen model here
MODEL_DI = "lizpreciatior/lzlv_70b_fp16_hf"
chat_completion = openai.ChatCompletion.create(
    model=MODEL_DI,
    messages=[{"role": "user", "content": "Hello there"}],
    stream=True,
    max_tokens=100,
)

# print the chat completion
for event in chat_completion:
    print(event.choices)
copy

HTTP API

You can also use our lower level HTTP API directly. This will allow you to build your own prompt, and integrate it easily into any application.

curl -X POST \
    -d '{"input": "Tell me a joke."}'  \
    -H "Authorization: bearer <YOUR DEEPINFRA API TOKEN>"  \
    -H 'Content-Type: application/json'  \
    'https://api.deepinfra.com/v1/inference/lizpreciatior/lzlv_70b_fp16_hf'
copy

We have both streaming and non streaming APIs available and you can find more details here: https://deepinfra.com/lizpreciatior/lzlv_70b_fp16_hf/api

Cost

This is a 70b model, and it is quite expensive to run. Typically, you need at least couple A100 80GB GPUs to run it, but to get the best performance you need at least 4 high end GPUs. Getting this hardware and setting everything up is quite expensive and time-consuming. Our APIs however are the most cost-effective way to run this model. For 70b models we charge 0.7 USD per 1M input tokens, and 0.9 USD per 1M output tokens. The biggest saving is that you only pay for usage and not the time.

If you need any help, just reach out to us on our Discord server.

Juggernaut FLUX is live on DeepInfra!Juggernaut FLUX is live on DeepInfra! At DeepInfra, we care about one thing above all: making cutting-edge AI models accessible. Today, we're excited to release the most downloaded model to our platform. Whether you're a visual artist, developer, or building an app that relies on high-fidelity ...

GLM-4.6 API: Get fast first tokens at the best $/M from Deepinfra's API - Deep Infra<p>GLM-4.6 is a high-capacity, “reasoning”-tuned model that shows up in coding copilots, long-context RAG, and multi-tool agent loops. With this class of workload, provider infrastructure determines perceived speed (first-token time), tail stability, and your unit economics. Using ArtificialAnalysis (AA) provider charts for GLM-4.6 (Reasoning), DeepInfra (FP8) pairs a sub-second Time-to-First-Token (TTFT) (0.51 s) with the […]</p>

Reliable JSON-Only Responses with DeepInfra LLMs<p>When large language models are used inside real applications, their role changes fundamentally. Instead of chatting with users, they become infrastructure components: extracting information, transforming text, driving workflows, or powering APIs. In these scenarios, natural language is no longer the desired output. What applications need is structured data — and very often, that structure is […]</p>

View all