a month ago

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

Siyoon Jin Seongchan Kim Dahyun Chung Jaeho Lee Hyunwook Choi Jisu Nam Jiyoung Kim Seungryong Kim

Abstract

Video DiTs have advanced video generation, yet they still struggle to modelmulti-instance or subject-object interactions. This raises a key question: Howdo these models internally represent interactions? To answer this, we curateMATRIX-11K, a video dataset with interaction-aware captions and multi-instancemask tracks. Using this dataset, we conduct a systematic analysis thatformalizes two perspectives of video DiTs: semantic grounding, viavideo-to-text attention, which evaluates whether noun and verb tokens captureinstances and their relations; and semantic propagation, via video-to-videoattention, which assesses whether instance bindings persist across frames. Wefind both effects concentrate in a small subset of interaction-dominant layers.Motivated by this, we introduce MATRIX, a simple and effective regularizationthat aligns attention in specific layers of video DiTs with multi-instance masktracks from the MATRIX-11K dataset, enhancing both grounding and propagation.We further propose InterGenEval, an evaluation protocol for interaction-awarevideo generation. In experiments, MATRIX improves both interaction fidelity andsemantic alignment while reducing drift and hallucination. Extensive ablationsvalidate our design choices. Codes and weights will be released.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

MATRIX: Mask Track Alignment for Interaction-aware Video Generation

Siyoon Jin Seongchan Kim Dahyun Chung Jaeho Lee Hyunwook Choi Jisu Nam Jiyoung Kim Seungryong Kim

Abstract

Build AI with AI

Hyper Newsletters