HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Wen Huang Jiarui Yang Tao Dai Jiawei Li Shaoxiong Zhan Bin Wang Shu-Tao Xia

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Abstract

Visual manipulation localization (VML) -- across both images and videos -- is a crucial task in digital forensics that involves identifying tampered regions in visual content. However, existing methods often lack cross-modal generalization and struggle to handle high-resolution or long-duration inputs efficiently. We propose RelayFormer, a unified and modular architecture for visual manipulation localization across images and videos. By leveraging flexible local units and a Global-Local Relay Attention (GLoRA) mechanism, it enables scalable, resolution-agnostic processing with strong generalization. Our framework integrates seamlessly with existing Transformer-based backbones, such as ViT and SegFormer, via lightweight adaptation modules that require only minimal architectural changes, ensuring compatibility without disrupting pretrained representations. Furthermore, we design a lightweight, query-based mask decoder that supports one-shot inference across video sequences with linear complexity. Extensive experiments across multiple benchmarks demonstrate that our approach achieves state-of-the-art localization performance, setting a new baseline for scalable and modality-agnostic VML. Code is available at: this https URL.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization | Papers | HyperAI