HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

Gueter Josmy Faure; Jia-Fong Yeh; Min-Hung Chen; Hung-Ting Su; Shang-Hong Lai; Winston H. Hsu

HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics

Abstract

Existing research often treats long-form videos as extended short videos, leading to several limitations: inadequate capture of long-range dependencies, inefficient processing of redundant information, and failure to extract high-level semantic concepts. To address these issues, we propose a novel approach that more accurately reflects human cognition. This paper introduces HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics, a model that simulates episodic memory accumulation to capture action sequences and reinforces them with semantic knowledge dispersed throughout the video. Our work makes two key contributions: First, we develop an Episodic COmpressor (ECO) that efficiently aggregates crucial representations from micro to semi-macro levels, overcoming the challenge of long-range dependencies. Second, we propose a Semantics ReTRiever (SeTR) that enhances these aggregated representations with semantic information by focusing on the broader context, dramatically reducing feature dimensionality while preserving relevant macro-level information. This addresses the issues of redundancy and lack of high-level concept extraction. Extensive experiments demonstrate that HERMES achieves state-of-the-art performance across multiple long-video understanding benchmarks in both zero-shot and fully-supervised settings.

Code Repositories

joslefaure/HERMES
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-classification-on-breakfastHERMES
Accuracy (%): 95.2
video-classification-on-coin-1HERMES
Accuracy (%): 93.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics | Papers | HyperAI