Command Palette
Search for a command to run...
Minji Kim; Seungkwan Lee; Jungseul Ok; Bohyung Han; Minsu Cho

Abstract
Despite the extensive adoption of machine learning on the task of visual object tracking, recent learning-based approaches have largely overlooked the fact that visual tracking is a sequence-level task in its nature; they rely heavily on frame-level training, which inevitably induces inconsistency between training and testing in terms of both data distributions and task objectives. This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning and discusses how a sequence-level design of data sampling, learning objectives, and data augmentation can improve the accuracy and robustness of tracking algorithms. Our experiments on standard benchmarks including LaSOT, TrackingNet, and GOT-10k demonstrate that four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training without modifying architectures.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-object-tracking-on-nv-vot211 | SLT-TransT | AUC: 37.22 Precision: 51.70 |
| visual-object-tracking-on-got-10k | SLT-TransT | Average Overlap: 67.5 Success Rate 0.5: 76.8 Success Rate 0.75: 60.3 |
| visual-object-tracking-on-lasot | SLT-TransT | AUC: 66.8 Normalized Precision: 75.5 |
| visual-object-tracking-on-trackingnet | SLT-TransT | Accuracy: 82.8 Normalized Precision: 87.5 Precision: 81.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.