Command Palette
Search for a command to run...
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
Yaozong Zheng; Bineng Zhong; Qihua Liang; Zhiyi Mo; Shengping Zhang; Xianxian Li

Abstract
Online contextual reasoning and association across consecutive video frames are critical to perceive instances in visual tracking. However, most current top-performing trackers persistently lean on sparse temporal relationships between reference and search frames via an offline mode. Consequently, they can only interact independently within each image-pair and establish limited temporal correlations. To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner. ODTrack receives video frames of arbitrary length to capture the spatio-temporal trajectory relationships of an instance, and compresses the discrimination features (localization information) of a target into a token sequence to achieve frame-to-frame association. This new solution brings the following benefits: 1) the purified token sequences can serve as prompts for the inference in the next video frame, whereby past information is leveraged to guide future inference; 2) the complex online update strategies are effectively avoided by the iterative propagation of token sequences, and thus we can achieve more efficient model representation and computation. ODTrack achieves a new \textit{SOTA} performance on seven benchmarks, while running at real-time speed. Code and models are available at \url{https://github.com/GXNU-ZhongLab/ODTrack}.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| semi-supervised-video-object-segmentation-on-15 | ODTrack-B | EAO: 0.581 |
| semi-supervised-video-object-segmentation-on-15 | ODTrack-L | EAO: 0.605 |
| video-object-tracking-on-nv-vot211 | ODTrack | AUC: 39.60 Precision: 55.80 |
| visual-object-tracking-on-didi | ODTrack | Tracking quality: 0.608 |
| visual-object-tracking-on-got-10k | ODTrack-B | Average Overlap: 77.0 |
| visual-object-tracking-on-got-10k | ODTrack-L | Average Overlap: 78.2 |
| visual-object-tracking-on-lasot | ODTrack-B | AUC: 73.2 |
| visual-object-tracking-on-lasot | ODTrack-L | AUC: 74.0 |
| visual-object-tracking-on-lasot-ext | ODTrack-L | AUC: 53.9 |
| visual-object-tracking-on-lasot-ext | ODTrack-B | AUC: 52.4 |
| visual-object-tracking-on-otb-2015 | ODTrack-B | AUC: 0.723 |
| visual-object-tracking-on-otb-2015 | ODTrack-L | AUC: 0.724 |
| visual-object-tracking-on-tnl2k | ODTrack-B | AUC: 60.9 |
| visual-object-tracking-on-tnl2k | ODTrack-L | AUC: 61.7 |
| visual-object-tracking-on-trackingnet | ODTrack-B | Accuracy: 85.1 |
| visual-object-tracking-on-trackingnet | ODTrack-L | Accuracy: 86.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.