Self Supervised Action Recognition On Ucf101

评估指标

3-fold Accuracy
Frozen
Pre-Training Dataset

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
VideoMAE V2-g99.6--VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MVD (ViT-B)97.5falseKinetics400Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
SSL-KD (R21D-18)97.3falseKinetics400A Large-Scale Analysis on Self-Supervised Video Representation Learning-
M3Video96.5falseKinetics400Masked Motion Encoding for Self-Supervised Video Representation Learning
pBYOL96.3falseKinetics400A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
VideoMAE96.1falseKinetics400VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
SCE (R3D-50)95.3falseKinetics400Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
MMV TSM-50x295.2falseAudioset + Howto100MSelf-Supervised MultiModal Versatile Networks
XKD (ViT-B/112/16)94.1-Kinetics400XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
CVRL (R3D-152 2x; K600)93.9falseKinetics600Spatiotemporal Contrastive Video Representation Learning
RSPNet93.7falseKinetics400RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
XKD-Modality-Agnostic (ViT-B/112/16)93.4--XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
CVRL (R3D-50; K600)93.4falseKinetics600Spatiotemporal Contrastive Video Representation Learning
VideoMS (ViT-B)93.4falseno extra dataEVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
BraVe:V-FA (TSM-50x2)93.1false-Broaden Your Views for Self-Supervised Video Learning
CrissCross (AudioSet)92.4falseAudioSetSelf-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
CVRL (R3D-50; K400)92.2falseKinetics400Spatiotemporal Contrastive Video Representation Learning
AVID+CMA (Modified R2+1D-18 on Audioset)91.5falseAudioset (Audio+Video)Audio-Visual Instance Discrimination with Cross-Modal Agreement
CrissCross (Kinetics400)91.5falseKinetics400Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
VideoMAE(no extra data)91.3falseno extra dataVideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
0 of 53 row(s) selected.
Self Supervised Action Recognition On Ucf101 | SOTA | HyperAI超神经