New model named Chatterbox by Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations. Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out.
New model named Chatterbox by Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations. Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out.
Text to convert to speech
Voice ID created on deepinfra. (Default: empty)
Exaggeration
Exaggeration factor for the speech (Default: 0.25, 0 ≤ exaggeration ≤ 1)
CFG
CFG factor for the speech (Default: 0.5, 0.1 ≤ cfg ≤ 1)
Temperature
Temperature for the speech (Default: 0.7, 0 ≤ temperature ≤ 2)
Seed for the random number generator (Default: empty, 0 ≤ seed ≤ 2147483647)
Waiting for audio data... Submit request to start streaming.
We're excited to introduce Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. It's also the first open source TTS model to support emotion exaggeration control, a powerful feature that makes your voices stand out. Try it now on our Hugging Face Gradio app.
If you like the model but need to scale or tune it for higher accuracy, check out our competitively priced TTS service (link). It delivers reliable performance with ultra-low latency of sub 200ms—ideal for production use in agents, applications, or interactive media.
General Use (TTS and Voice Agents):
exaggeration=0.5
, cfg=0.5
) work well for most prompts.cfg
to around 0.3
can improve pacing.Expressive or Dramatic Speech:
cfg
values (e.g. ~0.3
) and increase exaggeration
to around 0.7
or higher.exaggeration
tends to speed up speech; reducing cfg
helps compensate with slower, more deliberate pacing.