5 months ago

Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking

Xin Chen; Ben Kang; Jiawen Zhu; Dong Wang; Houwen Peng; Huchuan Lu

Abstract

In this paper, we introduce a new sequence-to-sequence learning framework for RGB-based and multi-modal object tracking. First, we present SeqTrack for RGB-based tracking. It casts visual tracking as a sequence generation task, forecasting object bounding boxes in an autoregressive manner. This differs from previous trackers, which depend on the design of intricate head networks, such as classification and regression heads. SeqTrack employs a basic encoder-decoder transformer architecture. The encoder utilizes a bidirectional transformer for feature extraction, while the decoder generates bounding box sequences autoregressively using a causal transformer. The loss function is a plain cross-entropy. Second, we introduce SeqTrackv2, a unified sequence-to-sequence framework for multi-modal tracking tasks. Expanding upon SeqTrack, SeqTrackv2 integrates a unified interface for auxiliary modalities and a set of task-prompt tokens to specify the task. This enables it to manage multi-modal tracking tasks using a unified model and parameter set. This sequence learning paradigm not only simplifies the tracking framework, but also showcases superior performance across 14 challenging benchmarks spanning five single- and multi-modal tracking tasks. The code and models are available at https://github.com/chenxin-dlut/SeqTrackv2.

Code Repositories

chenxin-dlut/seqtrackv2

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
rgb-t-tracking-on-lasher	SeqTrackv2-L256	Precision: 74.1 Success: 58.8
rgb-t-tracking-on-lasher	SeqTrackv2-B256	Precision: 70.4 Success: 55.8
rgb-t-tracking-on-lasher	SeqTrackv2-L384	Precision: 76.7 Success: 61.0
rgb-t-tracking-on-lasher	SeqTrackv2-B384	Precision: 71.5 Success: 56.2
rgb-t-tracking-on-rgbt234	SeqTrackv2-L256	Precision: 92.3 Success: 68.5
rgb-t-tracking-on-rgbt234	SeqTrackv2-B384	Precision: 90.0 Success: 66.3
rgb-t-tracking-on-rgbt234	SeqTrackv2-L384	Precision: 91.3 Success: 68.0
rgb-t-tracking-on-rgbt234	SeqTrackv2-B256	Precision: 88.0 Success: 64.7
visual-object-tracking-on-got-10k	SeqTrack-L384	Average Overlap: 74.8 Success Rate 0.5: 81.9 Success Rate 0.75: 72.2
visual-object-tracking-on-lasot	SeqTrack-L384	AUC: 72.5 Normalized Precision: 81.5 Precision: 79.3
visual-object-tracking-on-lasot-ext	SeqTrack-L384	AUC: 50.7 Normalized Precision: 61.6 Precision: 57.5
visual-object-tracking-on-needforspeed	SeqTrack-L384	AUC: 0.662
visual-object-tracking-on-otb-2015	SeqTrack-L384	AUC: 0.683
visual-object-tracking-on-tnl2k	SeqTrack-L384	AUC: 57.8
visual-object-tracking-on-trackingnet	SeqTrack-L384	Accuracy: 85.5 Normalized Precision: 89.8 Precision: 85.8
visual-object-tracking-on-uav123	SeqTrack-L384	AUC: 0.685

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette