Date

a year ago

1. Tutorial Introduction

FramePack is an open-source video generation framework developed in April 2025 by Zhang Lvmin's team, the authors of ControlNet. Through an innovative neural network architecture, it effectively solves problems such as high memory consumption, drift, and forgetting in traditional video generation, while significantly reducing hardware requirements. Related research papers are available. Packing Input Frame Context in Next-Frame Prediction Models for Video Generation .

The computing resources used in this tutorial are RTX 4090.

Effect examples

Project Requirements

Nvidia GPUs in the RTX 30XX, 40XX, 50XX series with support for fp16 and bf16. GTX 10XX/20XX not tested.
Linux or Windows operating system.
At least 6GB of GPU memory.

To generate 1 minute of video (60 seconds) at 30fps (1800 frames) using the 13B model, the minimum GPU memory required is 6GB.

Regarding speed, on an RTX 4090 desktop it produces 2.5s/frame (unoptimized) or 1.5s/frame (teacache). On a laptop, like a 3070ti laptop or a 3060 laptop, it's about 4 to 8 times slower.If you are much slower than this, troubleshoot..

During the video generation process, you can directly see the generated frames because it uses next-frame (-section) prediction. Therefore, you will get a lot of visual feedback before generating the entire video.

2. Operation steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. Functional Demonstration

After uploading the picture and adding the prompt words, click "Start Generation" to generate the video.

Citation Information

Thanks to GitHub user boyswu For the production of this tutorial, the project reference information is as follows:

@article{zhang2025framepack,
    title={Packing Input Frame Contexts in Next-Frame Prediction Models for Video Generation},
    author={Lvmin Zhang and Maneesh Agrawala},
    journal={Arxiv},
    year={2025}
}

Exchange and discussion

🖌️ If you see a high-quality project, please leave a message in the background to recommend it! In addition, we have also established a tutorial exchange group. Welcome friends to scan the QR code and remark [SD Tutorial] to join the group to discuss various technical issues and share application effects↓

This notebook is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

Date

a year ago

1. Tutorial Introduction

The computing resources used in this tutorial are RTX 4090.

Effect examples

Project Requirements

Nvidia GPUs in the RTX 30XX, 40XX, 50XX series with support for fp16 and bf16. GTX 10XX/20XX not tested.
Linux or Windows operating system.
At least 6GB of GPU memory.

To generate 1 minute of video (60 seconds) at 30fps (1800 frames) using the 13B model, the minimum GPU memory required is 6GB.

2. Operation steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. Functional Demonstration

After uploading the picture and adding the prompt words, click "Start Generation" to generate the video.

Citation Information

Thanks to GitHub user boyswu For the production of this tutorial, the project reference information is as follows:

@article{zhang2025framepack,
    title={Packing Input Frame Contexts in Next-Frame Prediction Models for Video Generation},
    author={Lvmin Zhang and Maneesh Agrawala},
    journal={Arxiv},
    year={2025}
}

Exchange and discussion

Related Notebooks

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Run this Notebook Discuss on Discord

Date

a year ago

1. Tutorial Introduction

The computing resources used in this tutorial are RTX 4090.

Effect examples

Project Requirements

Nvidia GPUs in the RTX 30XX, 40XX, 50XX series with support for fp16 and bf16. GTX 10XX/20XX not tested.
Linux or Windows operating system.
At least 6GB of GPU memory.

To generate 1 minute of video (60 seconds) at 30fps (1800 frames) using the 13B model, the minimum GPU memory required is 6GB.

2. Operation steps

1. After starting the container, click the API address to enter the Web interface

If "Bad Gateway" is displayed, it means the model is initializing. Since the model is large, please wait about 1-2 minutes and refresh the page.

2. Functional Demonstration

After uploading the picture and adding the prompt words, click "Start Generation" to generate the video.

Citation Information

Thanks to GitHub user boyswu For the production of this tutorial, the project reference information is as follows:

@article{zhang2025framepack,
    title={Packing Input Frame Contexts in Next-Frame Prediction Models for Video Generation},
    author={Lvmin Zhang and Maneesh Agrawala},
    journal={Arxiv},
    year={2025}
}

Exchange and discussion

Related Notebooks

Phi-4-reasoning-vision-15B Multimodal Reasoning Vision Model Demo

3 months ago

ACE-Step 1.5: Music Generation Demo

3 months ago

VibeVoice-ASR: Multifunctional End-to-End Speech Recognition Demo

3 months ago

Qwen3-TTS: High-Quality Controllable Multilingual Speech Synthesis Demo

4 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

FramePack Low Video Memory Video Generation Demo

1. Tutorial Introduction

Effect examples

Project Requirements

2. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Functional Demonstration

Citation Information

Exchange and discussion

Build AI with AI

HyperAI Newsletters

Command Palette

FramePack Low Video Memory Video Generation Demo

1. Tutorial Introduction

Effect examples

Project Requirements

2. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Functional Demonstration

Citation Information

Exchange and discussion

Related Notebooks

Phi-4-reasoning-vision-15B Multimodal Reasoning Vision Model Demo

ACE-Step 1.5: Music Generation Demo

VibeVoice-ASR: Multifunctional End-to-End Speech Recognition Demo

Qwen3-TTS: High-Quality Controllable Multilingual Speech Synthesis Demo

Build AI with AI

HyperAI Newsletters

Command Palette

FramePack Low Video Memory Video Generation Demo

1. Tutorial Introduction

Effect examples

Project Requirements

2. Operation steps

1. After starting the container, click the API address to enter the Web interface

2. Functional Demonstration

Citation Information

Exchange and discussion

Related Notebooks

Phi-4-reasoning-vision-15B Multimodal Reasoning Vision Model Demo

ACE-Step 1.5: Music Generation Demo

VibeVoice-ASR: Multifunctional End-to-End Speech Recognition Demo

Qwen3-TTS: High-Quality Controllable Multilingual Speech Synthesis Demo

Build AI with AI

HyperAI Newsletters

Related Notebooks

Phi-4-reasoning-vision-15B Multimodal Reasoning Vision Model Demo

ACE-Step 1.5: Music Generation Demo

VibeVoice-ASR: Multifunctional End-to-End Speech Recognition Demo

Qwen3-TTS: High-Quality Controllable Multilingual Speech Synthesis Demo

Related Notebooks

Phi-4-reasoning-vision-15B Multimodal Reasoning Vision Model Demo

ACE-Step 1.5: Music Generation Demo

VibeVoice-ASR: Multifunctional End-to-End Speech Recognition Demo

Qwen3-TTS: High-Quality Controllable Multilingual Speech Synthesis Demo