
摘要
本文介绍了一种新的基于RGB和多模态的目标跟踪序列到序列学习框架。首先,我们提出了SeqTrack用于基于RGB的跟踪。该方法将视觉跟踪视为一个序列生成任务,以自回归的方式预测目标边界框。这与以往依赖于复杂头部网络设计(如分类头和回归头)的跟踪器不同。SeqTrack采用了基本的编码器-解码器Transformer架构。编码器利用双向Transformer进行特征提取,而解码器则通过因果Transformer自回归地生成边界框序列。损失函数为简单的交叉熵损失。其次,我们介绍了SeqTrackv2,这是一种统一的多模态跟踪任务的序列到序列框架。在SeqTrack的基础上,SeqTrackv2集成了一个统一的辅助模态接口和一组任务提示标记(task-prompt tokens),以指定不同的任务。这使得它能够使用统一的模型和参数集来处理多模态跟踪任务。这种序列学习范式不仅简化了跟踪框架,还在涵盖五个单模态和多模态跟踪任务的14个具有挑战性的基准测试中展示了优越的性能。代码和模型可在https://github.com/chenxin-dlut/SeqTrackv2 获取。
代码仓库
chenxin-dlut/seqtrackv2
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| rgb-t-tracking-on-lasher | SeqTrackv2-L256 | Precision: 74.1 Success: 58.8 |
| rgb-t-tracking-on-lasher | SeqTrackv2-B256 | Precision: 70.4 Success: 55.8 |
| rgb-t-tracking-on-lasher | SeqTrackv2-L384 | Precision: 76.7 Success: 61.0 |
| rgb-t-tracking-on-lasher | SeqTrackv2-B384 | Precision: 71.5 Success: 56.2 |
| rgb-t-tracking-on-rgbt234 | SeqTrackv2-L256 | Precision: 92.3 Success: 68.5 |
| rgb-t-tracking-on-rgbt234 | SeqTrackv2-B384 | Precision: 90.0 Success: 66.3 |
| rgb-t-tracking-on-rgbt234 | SeqTrackv2-L384 | Precision: 91.3 Success: 68.0 |
| rgb-t-tracking-on-rgbt234 | SeqTrackv2-B256 | Precision: 88.0 Success: 64.7 |
| visual-object-tracking-on-got-10k | SeqTrack-L384 | Average Overlap: 74.8 Success Rate 0.5: 81.9 Success Rate 0.75: 72.2 |
| visual-object-tracking-on-lasot | SeqTrack-L384 | AUC: 72.5 Normalized Precision: 81.5 Precision: 79.3 |
| visual-object-tracking-on-lasot-ext | SeqTrack-L384 | AUC: 50.7 Normalized Precision: 61.6 Precision: 57.5 |
| visual-object-tracking-on-needforspeed | SeqTrack-L384 | AUC: 0.662 |
| visual-object-tracking-on-otb-2015 | SeqTrack-L384 | AUC: 0.683 |
| visual-object-tracking-on-tnl2k | SeqTrack-L384 | AUC: 57.8 |
| visual-object-tracking-on-trackingnet | SeqTrack-L384 | Accuracy: 85.5 Normalized Precision: 89.8 Precision: 85.8 |
| visual-object-tracking-on-uav123 | SeqTrack-L384 | AUC: 0.685 |