
摘要
本文讨论了视频分析中的几种时空卷积形式,并研究了它们在动作识别中的影响。我们的动机源于观察到,应用于视频单帧的二维卷积神经网络(2D CNN)在动作识别中仍然是性能稳健的方法。在这项工作中,我们通过实验展示了在残差学习框架下,三维卷积神经网络(3D CNN)相比二维卷积神经网络(2D CNN)在准确性上的优势。此外,我们还证明了将三维卷积滤波器分解为独立的空间和时间组件可以显著提高准确性。基于我们的实证研究,设计了一种新的时空卷积块“R(2+1)D”,该卷积块构建的卷积神经网络在Sports-1M、Kinetics、UCF101和HMDB51数据集上取得了与现有最先进方法相当或更优的结果。
代码仓库
wasilone11/ICPR-RIP-2024
GitHub 中提及
AD2605/Action-Recognition
pytorch
GitHub 中提及
JuliBaCSE/WeCare_makeathon
pytorch
GitHub 中提及
juenkhaw/action_recognition_project1
pytorch
GitHub 中提及
karatuno/Action-Recognition
tf
GitHub 中提及
2023-MindSpore-1/ms-code-68
mindspore
GitHub 中提及
anonymous-p/Flickering_Adversarial_Video
pytorch
GitHub 中提及
3dperceptionlab/visual-wetlandbirds
pytorch
GitHub 中提及
facebookresearch/R2Plus1D
官方
caffe2
GitHub 中提及
2023-MindSpore-1/ms-code-187
mindspore
GitHub 中提及
kingcong/r2plus1d
mindspore
fmthoker/severe-benchmark
pytorch
GitHub 中提及
juenkhaw/action_recognition_project
pytorch
GitHub 中提及
leftthomas/r2plus1d-c3d
pytorch
GitHub 中提及
kietngt00/hmdb51-recognition
pytorch
GitHub 中提及
BelixRogner/SpeedChallenge
pytorch
GitHub 中提及
facebookresearch/VMZ
caffe2
GitHub 中提及
Bangbangbanana/r2plus1d_mindspore
mindspore
GitHub 中提及
open-mmlab/mmaction2
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| action-classification-on-kinetics-400 | R[2+1]D-RGB (Sports-1M pretrain) | Acc@1: 74.3 Acc@5: 91.4 |
| action-classification-on-kinetics-400 | R[2+1]D-RGB | Acc@1: 72 Acc@5: 90 |
| action-classification-on-kinetics-400 | R[2+1]D-Two-Stream | Acc@1: 73.9 Acc@5: 90.9 |
| action-classification-on-kinetics-400 | R[2+1]D | Acc@1: 72 Acc@5: 90 |
| action-classification-on-kinetics-400 | R[2+1]D-Flow | Acc@1: 67.5 Acc@5: 87.2 |
| action-classification-on-kinetics-400 | R[2+1]D-Flow (Sports-1M pretrain) | Acc@1: 75.4 Acc@5: 91.9 |
| action-recognition-in-videos-on-hmdb-51 | R[2+1]D-Flow (Kinetics pretrained) | Average accuracy of 3 splits: 76.4 |
| action-recognition-in-videos-on-hmdb-51 | R[2+1]D-RGB (Sports1M pretrained) | Average accuracy of 3 splits: 66.6 |
| action-recognition-in-videos-on-hmdb-51 | R[2+1]D-TwoStream (Kinetics pretrained) | Average accuracy of 3 splits: 78.7 |
| action-recognition-in-videos-on-hmdb-51 | R[2+1]D-RGB (Kinetics pretrained) | Average accuracy of 3 splits: 74.5 |
| action-recognition-in-videos-on-hmdb-51 | R[2+1D]D-TwoStream (Sports1M pretrained) | Average accuracy of 3 splits: 72.7 |
| action-recognition-in-videos-on-hmdb-51 | R[2+1]D-Flow (Sports1M pretrained) | Average accuracy of 3 splits: 70.1 |
| action-recognition-in-videos-on-sports-1m | R[2+1]D-Two-Stream-32frame | Video hit@1 : 73.3 Video hit@5: 91.9 |
| action-recognition-in-videos-on-sports-1m | R[2+1]D-RGB-32frame | Clip Hit@1: 57 Video hit@1 : 73 Video hit@5: 91.5 |
| action-recognition-in-videos-on-sports-1m | R[2+1]D-Flow-32frame | Clip Hit@1: 46.4 Video hit@1 : 68.4 Video hit@5: 88.7 |
| action-recognition-in-videos-on-ucf101 | R[2+1]D-Flow (Sports-1M pretrained) | 3-fold Accuracy: 93.3 |
| action-recognition-in-videos-on-ucf101 | R[2+1]D-RGB (Sports-1M pretrained) | 3-fold Accuracy: 93.6 |
| action-recognition-in-videos-on-ucf101 | R[2+1]D-Flow (Kinetics pretrained) | 3-fold Accuracy: 95.5 |
| action-recognition-in-videos-on-ucf101 | R[2+1]D-TwoStream (Kinetics pretrained) | 3-fold Accuracy: 97.3 |
| action-recognition-in-videos-on-ucf101 | R[2+1]D-RGB (Kinetics pretrained) | 3-fold Accuracy: 96.8 |
| action-recognition-in-videos-on-ucf101 | R[2+1]D-TwoStream (Sports-1M pretrained) | 3-fold Accuracy: 95 |