HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Temporal Aggregate Representations for Long-Range Video Understanding

Fadime Sener Dipika Singhania Angela Yao

Temporal Aggregate Representations for Long-Range Video Understanding

Abstract

Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state of the art in both next action and dense anticipation with simple techniques such as max-pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on Breakfast, 50Salads, and EPIC-Kitchens datasets, where we achieve state-of-the-art results. With minimal modifications, our model can also be extended for video segmentation and action recognition.

Code Repositories

dibschat/tempAgg
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
action-anticipation-on-assembly101TempAgg
Actions Recall@5: 8.53
Objects Recall@5: 26.27
Verbs Recall@5: 59.11

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Temporal Aggregate Representations for Long-Range Video Understanding | Papers | HyperAI