Chat with books using DeepInfra and LlamaIndex

We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!

Published on 2024.06.07 by Oguz Vuruskaner

As DeepInfra, we are excited to announce our integration with LlamaIndex. LlamaIndex is a powerful library that allows you to index and search documents using various language models and embeddings. In this blog post, we will show you how to chat with books using DeepInfra and LlamaIndex.

We will be using the Project Gutenberg library to get the text of the book "Crime and Punishment" by Fyodor Dostoevsky. We will then use the Meta Llama 3 70B language model and the MiniLM embedding model to chat with the book.

Requirements

Python 3.9 or higher
DeepInfra API Key

Installation

First, let's create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate
copy

Here are the required packages to install:

llama-index
llama-index-llms-deepinfra
llama-index-embeddings-deepinfra
copy

Let's install them:

pip install llama-index llama-index-llms-deepinfra llama-index-embeddings-deepinfra
copy

Before getting started, we also need to get the API key for DeepInfra. You can get your DeepInfra API key from here.

Let's create a .env file in the root directory of the project and add the following lines:

DEEPINFRA_API_TOKEN=YOUR_DEEPINFRA_API_KEY
copy

Code Implementation

Here's a Python script to chat with the book "Crime and Punishment":

import requests
from dotenv import load_dotenv, find_dotenv
import re

_ = load_dotenv(find_dotenv())

from llama_index.core import VectorStoreIndex, Document

from llama_index.llms.deepinfra import DeepInfraLLM
from llama_index.embeddings.deepinfra import DeepInfraEmbeddingModel

LLM = "meta-llama/Meta-Llama-3-70B-Instruct"
EMBEDDING = "sentence-transformers/all-MiniLM-L12-v2"
BOOK_TITLE = "Crime and Punishment"


def maybe_get_gutenberg_book_id(title):
    url = f"http://gutendex.com/books/?search={title}"
    response = requests.get(url)
    books = response.json()["results"]
    for book in books:
        if title.lower() in book["title"].lower():
            return book["id"]
    return None


def get_document(book_id):
    url = f"https://www.gutenberg.org/files/{book_id}/{book_id}-0.txt"
    response = requests.get(url)
    text = response.text
    # Get rid of binary characters.
    text = re.sub(r"[^\x00-\x7F]+", "", text)
    return Document(text=text)


if __name__ == "__main__":

    llm = DeepInfraLLM(LLM, max_tokens=1000)
    embed_model = DeepInfraEmbeddingModel(EMBEDDING)

    book_id = maybe_get_gutenberg_book_id(BOOK_TITLE)
    document = get_document(book_id)

    index = VectorStoreIndex.from_documents([document], embed_model=embed_model)
    chat_engine = index.as_chat_engine(
        llm=llm, embed_model=embed_model, max_iterations=20
    )

    response = chat_engine.chat(
        "Summarize the discussion between Raskolnikov and Pyotr Petrovich"
    )
    print(response)

    # The conversation between Raskolnikov and Pyotr Petrovich takes place at the office of...
copy

Conclusion

Voila! You have successfully chatted with the book "Crime and Punishment" using DeepInfra and LlamaIndex. You can now use this code snippet to chat with any book of your choice. Enjoy reading!

For more information on LlamaIndex, please visit our LLM documentation and Embedding documentation.

Feel free to experiment with other books and questions to explore the capabilities of DeepInfra. See you in the next blog post!

Happy chatting! 📚🦙

Long Context models incomingMany users requested longer context models to help them summarize bigger chunks of text or write novels with ease. We're proud to announce our long context model selection that will grow bigger in the comming weeks. Models Mistral-based models have a context size of 32k, and amazon recently r...

Nemotron 3 Nano Explained: NVIDIA’s Efficient Small LLM and Why It Matters<p>The open-source LLM space has exploded with models competing across size, efficiency, and reasoning capability. But while frontier models dominate headlines with enormous parameter counts, a different category has quietly become essential for real-world deployment: small yet high-performance models optimized for edge devices, private on-prem systems, and cost-sensitive applications. NVIDIA’s Nemotron family brings together open […]</p>

Getting StartedGetting an API Key To use DeepInfra's services, you'll need an API key. You can get one by signing up on our platform. Sign up or log in to your DeepInfra account at deepinfra.com Navigate to the Dashboard and select API Keys Create a new ...

View all