4 个月前

时空对比视频表征学习

时空对比视频表征学习

摘要

我们提出了一种自监督对比视频表征学习(Contrastive Video Representation Learning, CVRL)方法,用于从未标记的视频中学习时空视觉表征。我们的表征通过对比损失函数进行学习,其中来自同一段短视频的两个增强片段在嵌入空间中被拉近,而来自不同视频的片段则被推开。我们研究了哪些数据增强方法对视频自监督学习是有效的,并发现空间信息和时间信息都至关重要。因此,我们精心设计了涉及空间和时间线索的数据增强方法。具体而言,我们提出了一种时间一致的空间增强方法,在对视频中的每一帧施加强烈的空间增强的同时保持帧间的时间一致性。此外,我们还提出了一种基于采样的时间增强方法,以避免对时间上相距较远的片段过度强制不变性。在Kinetics-600数据集上,使用CVRL学到的表征训练的线性分类器在3D-ResNet-50(R3D-50)主干网络下达到了70.4%的Top-1准确率,比使用相同膨胀R3D-50网络的ImageNet监督预训练高出15.7%,比SimCLR无监督预训练高出18.8%。使用更大的R3D-152(滤波器数量翻倍)主干网络时,CVRL的性能可进一步提升至72.9%,显著缩小了无监督与监督视频表征学习之间的差距。我们的代码和模型将在https://github.com/tensorflow/models/tree/master/official/ 上提供。

基准测试

基准方法指标
self-supervised-action-recognition-onCVRL (R3D-50)
Top-1 Accuracy: 70.4
self-supervised-action-recognition-onCVRL (R3D-101)
Top-1 Accuracy: 71.6
self-supervised-action-recognition-onCVRL (R3D-152 2x)
Top-1 Accuracy: 72.9
self-supervised-action-recognition-on-1CVRL (R3D-101)
Top-1 accuracy %: 67.6
self-supervised-action-recognition-on-1CVRL (R3D-152 2x; K600 pretrain)
Top-1 accuracy %: 71.6
self-supervised-action-recognition-on-1CVRL (R3D-50)
Top-1 accuracy %: 66.1
self-supervised-action-recognition-on-hmdb51CVRL (R3D-152 2x; K600)
Frozen: false
Pre-Training Dataset: Kinetics600
Top-1 Accuracy: 69.9
self-supervised-action-recognition-on-hmdb51CVRL (R3D-50; K400)
Frozen: false
Pre-Training Dataset: Kinetics400
Top-1 Accuracy: 66.7
self-supervised-action-recognition-on-hmdb51CVRL (R3D-50; K600)
Frozen: false
Pre-Training Dataset: Kinetics600
Top-1 Accuracy: 68.0
self-supervised-action-recognition-on-hmdb51-1CVRL (R3D-152 2x; K600)
Pretraining Dataset: K600
Top-1 Accuracy: 69.9
self-supervised-action-recognition-on-hmdb51-1CVRL (R3D-50; K600)
Pretraining Dataset: K600
Top-1 Accuracy: 68.0
self-supervised-action-recognition-on-hmdb51-1CVRL (R3D-50; K400)
Pretraining Dataset: K400
Top-1 Accuracy: 66.7
self-supervised-action-recognition-on-ucf101CVRL (R3D-50; K400)
3-fold Accuracy: 92.2
Frozen: false
Pre-Training Dataset: Kinetics400
self-supervised-action-recognition-on-ucf101CVRL (R3D-50; K600)
3-fold Accuracy: 93.4
Frozen: false
Pre-Training Dataset: Kinetics600
self-supervised-action-recognition-on-ucf101CVRL (R3D-152 2x; K600)
3-fold Accuracy: 93.9
Frozen: false
Pre-Training Dataset: Kinetics600
self-supervised-action-recognition-on-ucf101-1CVRL (R3D-50; K400)
3-fold Accuracy: 92.2
Pretrain: K400
self-supervised-action-recognition-on-ucf101-1CVRL (R3D-50; K600)
3-fold Accuracy: 93.4
Pretrain: K600
self-supervised-action-recognition-on-ucf101-1CVRL (R3D-152 2x; K600)
3-fold Accuracy: 93.9
Pretrain: K600

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
时空对比视频表征学习 | 论文 | HyperAI超神经