DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
Prompt
text prompt describing the video content
Negative Prompt
Negative text prompt (optional, not required); leave blank to fall back to the model's default negative.. (Default: uncanny face, mask-like, plastic skin, doll-like, waxy, mannequin, cgi, 3d render, deformed face, distorted face, extra fingers, deformed hands, blurry, washed out, vintage, 1970s, sepia, grainy, low quality)
Seconds
Clip duration: always 5 seconds (fixed/required for this model).
Resolution
Output resolution: always 1080p (fixed/required for this model).
Orientation
Output orientation: always landscape (fixed/required for this model).
Image Url
First-frame image for image-to-video (i2v): an http(s) URL or a data: URI. Required only for i2v; omit for text-to-video.. (Default: empty)
You need to log in to use this model
Log InSettings
Seed
specify a seed for reproducible output (Default: empty)
LTX-2.3 is a diffusion-transformer (DiT) audio-video foundation model from Lightricks that generates high-fidelity video with synchronized audio from text or a starting image. This endpoint serves the distilled variant, accelerated with FastVideo (Hao AI Lab, UCSD) to produce results in only a few denoising steps.
image_url (an http(s) URL or a data: URI).Provide a descriptive prompt. For image-to-video, also pass image_url. Use negative_prompt
to steer away from unwanted artifacts and seed for reproducible results. Detailed, concrete
prompts — subject, action, setting, lighting, camera motion, and any sound or dialogue — produce
the strongest results; for image-to-video, describe the motion you want applied to the supplied image.
That's the readme done. The full set is now ready to paste:
© 2026 DeepInfra. All rights reserved.