
摘要
我们提出ARTrack,一种用于视觉目标跟踪的自回归框架。ARTrack将跟踪任务建模为坐标序列的解释问题,通过逐步估计目标轨迹,其中当前的估计结果由先前状态推导而来,同时又会影响后续的轨迹序列。这种时间自回归的建模方式能够有效捕捉轨迹在时序上的演化过程,从而在帧间持续追踪目标,显著优于仅关注单帧定位精度的传统模板匹配类跟踪器。ARTrack结构简洁直接,无需定制化的定位头或后处理步骤。尽管架构简单,ARTrack在主流基准数据集上仍取得了当前最优的跟踪性能。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| video-object-tracking-on-nv-vot211 | ARTrack-L | AUC: 35.92 Precision: 51.64 |
| visual-object-tracking-on-got-10k | ARTrack-L | Average Overlap: 78.5 Success Rate 0.5: 87.4 Success Rate 0.75: 77.8 |
| visual-object-tracking-on-lasot | ARTrack-L | AUC: 73.1 Normalized Precision: 82.2 Precision: 80.3 |
| visual-object-tracking-on-lasot-ext | ARTrack-L | AUC: 52.8 Normalized Precision: 62.9 Precision: 59.7 |
| visual-object-tracking-on-tnl2k | ARTrack-L | AUC: 60.3 |
| visual-object-tracking-on-trackingnet | ARTrack-L | Accuracy: 85.6 Normalized Precision: 89.6 Precision: 86.0 |
| visual-object-tracking-on-uav123 | ARTrack-L | AUC: 0.712 |
| visual-tracking-on-tnl2k | ARTrack-L | AUC: 60.3 |