HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

A Better Baseline for AVA

Rohit Girdhar; João Carreira; Carl Doersch; Andrew Zisserman

A Better Baseline for AVA

Abstract

We introduce a simple baseline for action localization on the AVA dataset. The model builds upon the Faster R-CNN bounding box detection framework, adapted to operate on pure spatiotemporal features - in our case produced exclusively by an I3D model pretrained on Kinetics. This model obtains 21.9% average AP on the validation set of AVA v2.1, up from 14.5% for the best RGB spatiotemporal model used in the original AVA paper (which was pretrained on Kinetics and ImageNet), and up from 11.3 of the publicly available baseline using a ResNet101 image feature extractor, that was pretrained on ImageNet. Our final model obtains 22.8%/21.9% mAP on the val/test sets and outperforms all submissions to the AVA challenge at CVPR 2018.

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-in-videos-on-ava-v21I3D w/ RPN + JFT (Kinetics-400 pretraining(
mAP (Val): 22.8
action-recognition-in-videos-on-ava-v21I3D w/ RPN (Kinetics-400 pretraining(
mAP (Val): 21.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Better Baseline for AVA | Papers | HyperAI