HyperAIHyperAI

Command Palette

Search for a command to run...

Ditto-1M instruction-driven Video Editing Dataset

Date

8 days ago

Organization

Zhejiang University
Ant Group

Paper URL

2510.15742

License

Non-Commercial

Join the Discord Community

Ditto-1M is a command-driven video editing dataset released in 2025 by the Hong Kong University of Science and Technology, Ant Group, Zhejiang University and other institutions. The related paper results are "Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset", which aims to promote the development of video editing models based on natural language instructions, and improve the model's understanding of complex instructions and the accuracy of video generation through large-scale, high-quality synthetic samples.

This dataset contains approximately 1,000,000 high-fidelity video editing triples, each consisting of a source video, an editing instruction, and the edited video. Each video has an average of 101 frames and a resolution of 1,280×720. The editing tasks are divided into three categories:

  • Global style transfer: including artistic style changes, color grading, visual effects, etc.
  • Global freeform editing: including complex scene modifications, environmental changes, creative transformations, etc.
  • Local editing: includes precise object modification, attribute changes, local adjustments, etc.
Dataset Example

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Ditto-1M instruction-driven Video Editing Dataset | Datasets | HyperAI