We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…
hexgrad/Kokoro-82M cover image
featured

hexgrad/Kokoro-82M

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Public
$0.80 per M characters
ProjectPaperLicense

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -d '{"text": "The quick brown fox jumps over the lazy dog"}'  \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -H 'Content-Type: application/json'  \
    'https://api.deepinfra.com/v1/inference/hexgrad/Kokoro-82M'
copy

which will give you back something similar to:

{
  "audio": null,
  "input_character_length": 0,
  "output_format": "",
  "words": [
    {
      "end": 1.0,
      "start": 0.0,
      "text": "Hello"
    },
    {
      "end": 5.0,
      "start": 4.0,
      "text": "World"
    }
  ],
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

copy

Input fields

textstring

Text to convert to speech


output_formatstring

Output format for the speech

Default value: "wav"

Allowed values: mp3opusflacwavpcm


preset_voicearray

Preset voice name to use for the speech

Default value: ["af_bella"]


speednumber

Speed of the speech

Range: 0.25 ≤ speed ≤ 4


streamboolean

Whether to stream the output

Default value: false


return_timestampsboolean

Whether to return timestamps

Default value: false


sample_rateinteger

Sample rate for the output audio.


target_min_tokensinteger

Minimum number of tokens for the output.


target_max_tokensinteger

Maximum number of tokens for the output.


absolute_max_tokensinteger

Absolute maximum number of tokens for the output.


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema