We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

FLUX.2 is live! High-fidelity image generation made simple.

Langchain improvements: async and streaming
Published on 2023.10.25 by Iskren Chernev
Langchain improvements: async and streaming

Starting from langchain v0.0.322 you can make efficient async generation and streaming tokens with deepinfra.

Async generation

The deepinfra wrapper now supports native async calls, so you can expect more performance (no more threads per invocation) from your async pipelines.

from langchain.llms.deepinfra import DeepInfra

async def async_predict():
    llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
    output = await llm.apredict("What is 2 + 2?")
    print(output)
copy

Response streaming

Streaming lets you receive each token of the response as it gets generated. This is indispensable in user-facing applications.

def streaming():
    llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
    for chunk in llm.stream("[INST] Hello [/INST] "):
        print(chunk, end='', flush=True)
    print()
copy

You can also use the asynchronous streaming API, natively implemented underneath.

async def async_streaming():
    llm = DeepInfra(model_id="meta-llama/Llama-2-7b-chat-hf")
    async for chunk in llm.astream("[INST] Hello [/INST] "):
        print(chunk, end='', flush=True)
    print()
copy
Related articles
Introducing Tool Calling with LangChain, Search the Web with Tavily and Tool Calling AgentsIntroducing Tool Calling with LangChain, Search the Web with Tavily and Tool Calling AgentsIn this blog post, we will query for the details of a recently released expansion pack for Elden Ring, a critically acclaimed game released in 2022, using the Tavily tool with the ChatDeepInfra model. Using this boilerplate, one can automate the process of searching for information with well-writt...
Building a Voice Assistant with Whisper, LLM, and TTSBuilding a Voice Assistant with Whisper, LLM, and TTSLearn how to create a voice assistant using Whisper for speech recognition, LLM for conversation, and TTS for text-to-speech.
Build an OCR-Powered PDF Reader & Summarizer with DeepInfra (Kimi K2)Build an OCR-Powered PDF Reader & Summarizer with DeepInfra (Kimi K2)<p>This guide walks you from zero to working: you’ll learn what OCR is (and why PDFs can be tricky), how to turn any PDF—including those with screenshots of tables—into text, and how to let an LLM do the heavy lifting to clean OCR noise, reconstruct tables, and summarize the document. We’ll use DeepInfra’s OpenAI-compatible API [&hellip;]</p>