Action Classification On Kinetics 700

评估指标

Top-1 Accuracy
Top-5 Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
InternVideo2-6B85.9-InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B85.4-InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo-T84.0-InternVideo: General Video Foundation Models via Generative and Discriminative Learning
TubeViT-L83.896.6Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UMT-L (ViT-L/16)83.696.7Unmasked Teacher: Towards Training-Efficient Video Foundation Models
MTV-H (WTS 60M)83.496.2Multiview Transformers for Video Recognition
EVA82.9%-EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
UniFormerV2-L82.796.2UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer-
CoCa (finetuned)82.7-CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa (frozen)81.1-CoCa: Contrastive Captioners are Image-Text Foundation Models
Hiera-H (no extra data)81.1-Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
MaskFeat (no extra data, MViT-L)80.495.7Masked Feature Prediction for Self-Supervised Visual Pre-Training
mPLUG-280.494.9mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
AIM (CLIP ViT-L/14, 32x224)80.4-AIM: Adapting Image Models for Efficient Video Action Recognition
CoVeR (JFT-3B)79.894.9Co-training Transformer with Videos and Images Improves Action Recognition-
MViTv2-L (ImageNet-21k pretrain)79.494.9MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
MoViNet-A679.4-MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
CoVeR (JFT-300M)78.594.2Co-training Transformer with Videos and Images Improves Action Recognition-
MViTv2-B76.693.2MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
MoViNet-A672.3-MoViNets: Mobile Video Networks for Efficient Video Recognition
0 of 36 row(s) selected.
Action Classification On Kinetics 700 | SOTA | HyperAI超神经