Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!
ByteDance/
$1.200
/ 1M tokens
ByteDance's Seedance 1.5 Pro is a professional video model using V2A native generation for integrated, synced audio-visual output, enhancing efficiency of professional video creation.

Prompt
text prompt for video generation
You need to login to use this model
LoginSettings
Please upload an image file
Please upload an image file
Resolution
resolution of the output video
Aspect Ratio
aspect ratio of the output video
Duration
duration of the output video in seconds (4-12, or -1 for model to decide) (Default: empty)
Seed
random seed for reproducible output (Default: empty, -1 ≤ seed < 4294967296)
Camera Fixed
whether to use a fixed camera angle
Watermark
whether to add a watermark to the output video
Generate Audio
whether the generated video includes audio synchronized with the visuals
Seedance 1.5 pro is ByteDance's new professional-grade audio-visual co-generation model.It builds on multi-shot narrative and HD generation capabilities, supporting integrated audio and video output for a unified creation experience (visuals, human voice, music, and sound effects).The model includes a start/end frame feature, allowing creators to lock the video's style, composition, and characters by setting the first and last frames, which then drives the generation of smooth, dynamic video. This significantly enhances the efficiency, controllability, and artistic expressiveness of professional video creation.
Support joint generation of audio and video, with multiple elements such as ambient sounds, action sounds, synthesized sound effects, musical instruments, background music, and human voices, achieving millisecond-level audio-visual synchronization output.
Supports monologue and multi-person dialogue, with millisecond-level accurate lip sync and coverage of multiple languages, comprehensively restoring the authentic and natural texture of real conversations. Supported languages include English, Mandarin, Japanese, Korean, Spanish, Indonesian, Shaanxi dialect (China), Sichuan dialect (China).
Natural movement amplitude and a strong sense of rhythm, accurately capturing action details. Strong visual perception, with subtle presentation of character emotions and expressions, significantly enhancing vividness and achieving cinematic-grade creative quality.
© 2026 Deep Infra. All rights reserved.