We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfra
Published on 2023.04.05 by Yessen Kanapin
How to OpenAI Whisper with per-sentence and per-word timestamp segmentation using DeepInfra

Getting started

To use DeepInfra's API, you'll need an API key.

  1. Sign up or log in to your DeepInfra account
  2. Navigate to the Dashboard / API Keys section
  3. Create a new API key if you don't have one already

You'll use this API key in your requests to authenticate with our services.

Running speech recognition

Whisper is a Speech-To-Text model from OpenAI. Given an audio file with voice data it produces human speech recognition text with per sentence timestamps. There are different model sizes (small, base, large, etc.) and variants for English, see more at deepinfra.com. By default, Whisper produces by sentence timestamp segmentation. We also host whisper-timestamped that can provide timestamps for words in the audio. You can use it with our REST API. Here's how to use it:

curl -X POST \
  -F "audio=@/home/user/all-in-01.mp3" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  'https://api.deepinfra.com/v1/inference/openai/whisper-timestamped-medium.en'
copy

To see additional parameters and how to call this model, check out the documentation page for complete API reference and examples.

If you have any question, just reach out to us on our Discord server.

Related articles
Long Context models incomingLong Context models incomingMany users requested longer context models to help them summarize bigger chunks of text or write novels with ease. We're proud to announce our long context model selection that will grow bigger in the comming weeks. Models Mistral-based models have a context size of 32k, and amazon recently r...
Build an OCR-Powered PDF Reader & Summarizer with DeepInfra (Kimi K2)Build an OCR-Powered PDF Reader & Summarizer with DeepInfra (Kimi K2)<p>This guide walks you from zero to working: you’ll learn what OCR is (and why PDFs can be tricky), how to turn any PDF—including those with screenshots of tables—into text, and how to let an LLM do the heavy lifting to clean OCR noise, reconstruct tables, and summarize the document. We’ll use DeepInfra’s OpenAI-compatible API [&hellip;]</p>
Use OpenAI API clients with LLaMasUse OpenAI API clients with LLaMasGetting started # create a virtual environment python3 -m venv .venv # activate environment in current shell . .venv/bin/activate # install openai python client pip install openai Choose a model meta-llama/Llama-2-70b-chat-hf [meta-llama/L...