Action Classification On Moments In Time

评估指标

Top 1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
OmniVec253.1OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning-
InternVideo2-1B50.9InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
UMT-L (ViT-L/16)48.7Unmasked Teacher: Towards Training-Efficient Video Foundation Models
UniFormerV2-L47.8UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer-
MTV-H (WTS 60M)47.2Multiview Transformers for Video Recognition
CoVeR(JFT-3B)46.1Co-training Transformer with Videos and Images Improves Action Recognition-
CoVeR(JFT-300M)45.0Co-training Transformer with Videos and Images Improves Action Recognition-
VATT-Large41.1VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
MoViNet-A640.2MoViNets: Mobile Video Networks for Efficient Video Recognition
MoViNet-A539.1MoViNets: Mobile Video Networks for Efficient Video Recognition
MoViNet-A437.9MoViNets: Mobile Video Networks for Efficient Video Recognition
VTN37.4Video Transformer Network
MBT (AV)37.3Attention Bottlenecks for Multimodal Fusion
MoViNet-A335.6MoViNets: Mobile Video Networks for Efficient Video Recognition
MoViNet-A234.3MoViNets: Mobile Video Networks for Efficient Video Recognition
AssembleNet34.27%AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
SRTG r3d-10133.56Learn to cycle: Time-consistent feature discovery for action recognition
CoST (ResNet-101, 32 frames)32.4%Collaborative Spatiotemporal Feature Learning for Video Action Recognition-
MoViNet-A132.0MoViNets: Mobile Video Networks for Efficient Video Recognition
EvaNet31.8%Evolving Space-Time Neural Architectures for Videos-
0 of 29 row(s) selected.
Action Classification On Moments In Time | SOTA | HyperAI超神经