HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Stable Mean Teacher for Semi-supervised Video Action Detection

Kumar Akash ; Mitra Sirshapan ; Rawat Yogesh Singh

Stable Mean Teacher for Semi-supervised Video Action Detection

Abstract

In this work, we focus on semi-supervised learning for video actiondetection. Video action detection requires spatiotemporal localization inaddition to classification, and a limited amount of labels makes the modelprone to unreliable predictions. We present Stable Mean Teacher, a simpleend-to-end teacher-based framework that benefits from improved and temporallyconsistent pseudo labels. It relies on a novel Error Recovery (EoR) module,which learns from students' mistakes on labeled samples and transfers thisknowledge to the teacher to improve pseudo labels for unlabeled samples.Moreover, existing spatiotemporal losses do not take temporal coherency intoaccount and are prone to temporal inconsistencies. To address this, we presentDifference of Pixels (DoP), a simple and novel constraint focused on temporalconsistency, leading to coherent temporal detections. We evaluate our approachon four different spatiotemporal detection benchmarks: UCF101-24, JHMDB21, AVA,and YouTube-VOS. Our approach outperforms the supervised baselines for actiondetection by an average margin of 23.5% on UCF101-24, 16% on JHMDB21, and 3.3%on AVA. Using merely 10% and 20% of data, it provides competitive performancecompared to the supervised baseline trained on 100% annotations on UCF101-24and JHMDB21, respectively. We further evaluate its effectiveness on AVA forscaling to large-scale datasets and YouTube-VOS for video object segmentation,demonstrating its generalization capability to other tasks in the video domain.Code and models are publicly available.

Code Repositories

AKASH2907/stable-mean-teacher
Official
pytorch
Mentioned in GitHub
akash2907/stable_mean_teacher
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
action-detection-on-ucf101-24Stable Mean Teacher (I3D)
Frame-mAP 0.5: 73.9
Video-mAP 0.5: 76.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Stable Mean Teacher for Semi-supervised Video Action Detection | Papers | HyperAI