It’s ByteDance’s first
video model with native audio support
, generating dialogue, music, and ambient sound directly as part of the clip.
It supports:
  • Aspect ratios: 16:9, 1:1, 9:16
  • Durations: 4s, 5s, 6s, 7s, 8s, 9s, 10s, 12s
  • References: First frame, last frame.