HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Denize Julien ; Liashuha Mykola ; Rabarisoa Jaonary ; Orcesi Astrid ; Hérault Romain

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action
  Spotting using Transformers

Abstract

We present COMEDIAN, a novel pipeline to initialize spatiotemporaltransformers for action spotting, which involves self-supervised learning andknowledge distillation. Action spotting is a timestamp-level temporal actiondetection task. Our pipeline consists of three steps, with two initializationstages. First, we perform self-supervised initialization of a spatialtransformer using short videos as input. Additionally, we initialize a temporaltransformer that enhances the spatial transformer's outputs with global contextthrough knowledge distillation from a pre-computed feature bank aligned witheach short video segment. In the final step, we fine-tune the transformers tothe action spotting task. The experiments, conducted on the SoccerNet-v2dataset, demonstrate state-of-the-art performance and validate theeffectiveness of COMEDIAN's pretraining paradigm. Our results highlight severaladvantages of our pretraining pipeline, including improved performance andfaster convergence compared to non-pretrained models.

Code Repositories

juliendenize/eztorch
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
action-spotting-on-soccernet-v2COMEDIAN (ViSwin T ens.)
Average-mAP: 77.6
Tight Average-mAP: 73.1
action-spotting-on-soccernet-v2COMEDIAN (ViViT T)
Average-mAP: 76.1
Tight Average-mAP: 70.7
action-spotting-on-soccernet-v2COMEDIAN (ViViT T ens.)
Average-mAP: 77.1
Tight Average-mAP: 72.0
action-spotting-on-soccernet-v2COMEDIAN (ViSwin T)
Average-mAP: 76.6
Tight Average-mAP: 71.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers | Papers | HyperAI