Action Classification On Kinetics 400

评估指标

Acc@1

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
OmniVec293.6OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning-
FTP-UniFormerV2-L/1493.4Enhancing Video Transformers for Action Understanding with VLM-aided Training-
InternVideo2-6B92.1InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B91.6InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
OmniVec91.1OmniVec: Learning robust representations with cross modal sharing-
InternVideo91.1InternVideo: General Video Foundation Models via Generative and Discriminative Learning
TubeViT-H (ImageNet-1k)90.9Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UMT-L (ViT-L/16)90.6Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher (ViT-L)90.6Unmasked Teacher: Towards Training-Efficient Video Foundation Models
TubeVit-L (ImageNet-1k)90.2Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
UniFormerV2-L (ViT-L, 336)90.0UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer-
VideoMAE V2-g (64x266x266)90.0VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MTV-H (WTS 60M)89.9Multiview Transformers for Video Recognition
TAdaFormer-L/1489.9Temporally-Adaptive Models for Efficient Video Understanding
EVA89.7EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
AM/12 ViT-B Dinov289.6AM Flow: Adapters for Temporal Processing in Action Recognition-
ATM89.4What Can Simple Arithmetic Operations Do for Temporal Modeling?
CoCa (finetuned)88.9CoCa: Contrastive Captioners are Image-Text Foundation Models
ILA (ViT-L/14)88.7Implicit Temporal Modeling with Learnable Alignment for Video Recognition
BIKE (CLIP ViT-L/14)88.7Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
0 of 204 row(s) selected.
Action Classification On Kinetics 400 | SOTA | HyperAI超神经