We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

openai logo

openai/

whisper-large-v3-turbo

$0.00020

/ minute

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

openai/whisper-large-v3-turbo cover image

Input

Please upload an audio file

You need to log in to use this model

Log In

Settings

Task

task to perform

Initial Prompt

optional text to provide as a prompt for the first window.. (Default: empty)

Temperature

temperature to use for sampling (Default: 0)

Language

language that the audio is in; uses detected language if None; use two letter language code (ISO 639-1) (e.g. en, de, ja)

Chunk Level

chunk level, either 'segment' or 'word'

Chunk Length S

chunk length in seconds to split audio (Default: 30, 1 ≤ chunk_length_s ≤ 30)

Output