4 个月前

《行动识别何去何从?一种新模型及Kinetics数据集》

《行动识别何去何从?一种新模型及Kinetics数据集》

摘要

当前动作分类数据集(如UCF-101和HMDB-51)中视频数量的不足,使得识别优秀的视频架构变得困难,因为大多数方法在现有的小规模基准测试中表现出类似的性能。本文基于新的Kinetics人类动作视频数据集重新评估了最先进的架构。Kinetics的数据量比现有数据集高出两个数量级,包含400个人类动作类别,每个类别超过400个片段,并且这些数据是从现实且具有挑战性的YouTube视频中收集的。我们对当前架构在这项数据集上的动作分类任务表现进行了分析,并探讨了在Kinetics上预训练后,这些模型在较小的基准测试数据集上的性能提升情况。此外,我们还引入了一种新的双流膨胀3D卷积网络(Two-Stream Inflated 3D ConvNet, I3D),该网络基于2D卷积网络的膨胀:非常深的图像分类卷积网络中的滤波器和池化核被扩展到3D,从而使得从视频中学习无缝的空间-时间特征提取器成为可能,同时利用成功的ImageNet架构设计及其参数。我们展示了在Kinetics上预训练后,I3D模型在动作分类任务上的性能显著超过了现有最先进水平,在HMDB-51上达到了80.9%,在UCF-101上达到了98.0%。

代码仓库

helloxy96/CS5242_Project2020
pytorch
GitHub 中提及
2023-MindSpore-1/ms-code-24
mindspore
GitHub 中提及
yaohungt/GSTEG_CVPR_2019
pytorch
GitHub 中提及
prinshul/GWSDR
tf
GitHub 中提及
dlpbc/keras-kinetics-i3d
tf
GitHub 中提及
OanaIgnat/i3d_keras
tf
GitHub 中提及
LukasHedegaard/co3d
pytorch
GitHub 中提及
mHealthBuet/SegCodeNet
pytorch
GitHub 中提及
KingGugu/I3D
mindspore
GitHub 中提及
hassony2/kinetics_i3d_pytorch
pytorch
GitHub 中提及
piergiaj/pytorch-i3d
pytorch
GitHub 中提及
JeffCHEN2017/WSSTG
pytorch
GitHub 中提及
CMU-CREATE-Lab/deep-smoke-machine
pytorch
GitHub 中提及
aim3-ruc/youmakeup_challenge2022
pytorch
GitHub 中提及
Alexyuda/action_recognition
pytorch
GitHub 中提及
sebastiantiesmeyer/deeplabchop3d
pytorch
GitHub 中提及
daniansan/i3d_mindspore
mindspore
GitHub 中提及
PPPrior/i3d-pytorch
pytorch
GitHub 中提及
deepmind/kinetics-i3d
tf
GitHub 中提及
StanfordVL/RubiksNet
pytorch
GitHub 中提及

基准测试

基准方法指标
action-classification-on-charadesI3D
MAP: 32.9
action-classification-on-kinetics-400I3D
Acc@1: 71.1
Acc@5: 89.3
action-classification-on-moments-in-timeI3D
Top 1 Accuracy: 29.51%
Top 5 Accuracy: 56.06%
action-classification-on-toyota-smarthomeI3D
CS: 53.4
CV1: 34.9
CV2: 45.1
action-recognition-in-videos-on-hmdb-51Flow-I3D (Kinetics pre-training)
Average accuracy of 3 splits: 77.3
action-recognition-in-videos-on-hmdb-51Two-stream I3D
Average accuracy of 3 splits: 80.9
action-recognition-in-videos-on-hmdb-51Two-Stream I3D (Imagenet+Kinetics pre-training)
Average accuracy of 3 splits: 80.7
action-recognition-in-videos-on-hmdb-51RGB-I3D (Kinetics pre-training)
Average accuracy of 3 splits: 74.3
action-recognition-in-videos-on-hmdb-51Flow-I3D (Imagenet+Kinetics pre-training)
Average accuracy of 3 splits: 77.1
action-recognition-in-videos-on-hmdb-51RGB-I3D (Imagenet+Kinetics pre-training)
Average accuracy of 3 splits: 74.8
action-recognition-in-videos-on-ucf101Two-Stream I3D (Kinetics pre-training)
3-fold Accuracy: 97.8
action-recognition-in-videos-on-ucf101Flow-I3D (Imagenet+Kinetics pre-training)
3-fold Accuracy: 96.7
action-recognition-in-videos-on-ucf101RGB-I3D (Kinetics pre-training)
3-fold Accuracy: 95.1
action-recognition-in-videos-on-ucf101Two-stream I3D
3-fold Accuracy: 93.4
action-recognition-in-videos-on-ucf101Two-Stream I3D (Imagenet+Kinetics pre-training)
3-fold Accuracy: 98.0
action-recognition-in-videos-on-ucf101RGB-I3D (Imagenet+Kinetics pre-training)
3-fold Accuracy: 95.6
action-recognition-in-videos-on-ucf101Flow-I3D (Kinetics pre-training)
3-fold Accuracy: 96.5
hand-gesture-recognition-on-egogesture-1I3D
Accuracy: 92.78
hand-gesture-recognition-on-viva-hand-1I3D
Accuracy: 83.1
skeleton-based-action-recognition-on-j-hmdbI3D
Accuracy (RGB+pose): 84.1
video-object-tracking-on-caterI3D-50 + LSTM
L1: 1.2
Top 1 Accuracy: 60.2
Top 5 Accuracy: 81.8

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
《行动识别何去何从?一种新模型及Kinetics数据集》 | 论文 | HyperAI超神经