Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.
Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.
You can use cURL or any other http client to run inferences:
curl -X POST \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-F 'input=[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices.' \
'https://api.deepinfra.com/v1/inference/nari-labs/Dia-1.6B'
which will give you back something similar to:
{
"audio": null,
"input_character_length": 0,
"output_format": "",
"words": [
{
"text": "Hello",
"start": 0.0,
"end": 1.0,
"confidence": 0.5
},
{
"text": "World",
"start": 4.0,
"end": 5.0,
"confidence": 0.5
}
],
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
speaker_transcript
stringTranscript of the given speaker audio. If not provided then the speaker audio will be used as is.
max_new_tokens
integerControls the maximum length of the generated audio (more tokens = longer audio).
Default value: 3072
Range: 500 ≤ max_new_tokens ≤ 4096
cfg_scale
integerHigher values increase adherence to the text prompt.
Default value: 3
Range: 1 ≤ cfg_scale ≤ 5
temperature
numberLower values make the output more deterministic, higher values increase randomness.
Default value: 1.3
Range: 1 ≤ temperature ≤ 1.5
top_p
numberFilters vocabulary to the most likely tokens cumulatively reaching probability P.
Default value: 0.95
Range: 0.8 ≤ top_p ≤ 1
cfg_filter_top_k
integerTop k filter for CFG guidance.
Default value: 35
Range: 15 ≤ cfg_filter_top_k ≤ 50
speed
numberAdjusts the speed of the generated audio (1.0 = original speed).
Default value: 0.94
Range: 0.8 ≤ speed ≤ 1
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request