Command Palette
Search for a command to run...
Xu Mingze ; Xiong Yuanjun ; Chen Hao ; Li Xinyu ; Xia Wei ; Tu Zhuowen ; Soatto Stefano

Abstract
We present Long Short-term TRansformer (LSTR), a temporal modeling algorithmfor online action detection, which employs a long- and short-term memorymechanism to model prolonged sequence data. It consists of an LSTR encoder thatdynamically leverages coarse-scale historical information from an extendedtemporal window (e.g., 2048 frames spanning of up to 8 minutes), together withan LSTR decoder that focuses on a short time window (e.g., 32 frames spanning 8seconds) to model the fine-scale characteristics of the data. Compared to priorwork, LSTR provides an effective and efficient method to model long videos withfewer heuristics, which is validated by extensive empirical analysis. LSTRachieves state-of-the-art performance on three standard online action detectionbenchmarks, THUMOS'14, TVSeries, and HACS Segment. Code has been made availableat: https://xumingze0308.github.io/projects/lstr
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| online-action-detection-on-thumos-14 | LSTR | mAP: 69.5 |
| online-action-detection-on-tvseries | LSTR | mCAP: 89.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.