Self Supervised Action Recognition On Hmdb51

评估指标

Frozen
Pre-Training Dataset
Top-1 Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
MVD (ViT-B)falseKinetics40079.7Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
M3VideofalseKinetics40078.0Masked Motion Encoding for Self-Supervised Video Representation Learning
pBYOLfalseKinetics40075.0A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
SCE (R3D-50)falseKinetics40074.7Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
VideoMAEfalseKinetics40073.3VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
BraVe:V-FA (TSM-50x2)false-70.5Broaden Your Views for Self-Supervised Video Learning
CVRL (R3D-152 2x; K600)falseKinetics60069.9Spatiotemporal Contrastive Video Representation Learning
XKD (ViT-B/112/16)--69XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
XDCfalseIG-Kinetics68.9Self-Supervised Learning by Cross-Modal Audio-Video Clustering
CVRL (R3D-50; K600)falseKinetics60068.0Spatiotemporal Contrastive Video Representation Learning
CrissCross (AudioSet)falseAudioSet66.8Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
CVRL (R3D-50; K400)falseKinetics40066.7Spatiotemporal Contrastive Video Representation Learning
XDCfalseIG-Random66.5Self-Supervised Learning by Cross-Modal Audio-Video Clustering
XKD-Modality-Agnostic (ViT-B/112/16)--65.9XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
VideoMS (ViT-B)falseno extra data65.8EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
RSPNetfalseKinetics40064.7RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning
CrissCross (Kinetics400)falseKinetics40064.7Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
AVID+CMA (Modified R2+1D-18 on Audioset)falseAudioset (Video+Audio)64.7Audio-Visual Instance Discrimination with Cross-Modal Agreement
ELofalse-64.5Evolving Losses for Unsupervised Video Representation Learning-
AVID (Modified R2+1D-18 on Audioset)falseAudioset (Video+Audio)64.1Audio-Visual Instance Discrimination with Cross-Modal Agreement
0 of 48 row(s) selected.
Self Supervised Action Recognition On Hmdb51 | SOTA | HyperAI超神经