Command Palette
Search for a command to run...
Andong Lu; Wanyu Wang; Chenglong Li; Jin Tang; Bin Luo

Abstract
Multi-modal feature fusion as a core investigative component of RGBT tracking emerges numerous fusion studies in recent years. However, existing RGBT tracking methods widely adopt fixed fusion structures to integrate multi-modal feature, which are hard to handle various challenges in dynamic scenarios. To address this problem, this work presents a novel \emph{A}ttention-based \emph{F}usion rou\emph{ter} called AFter, which optimizes the fusion structure to adapt to the dynamic challenging scenarios, for robust RGBT tracking. In particular, we design a fusion structure space based on the hierarchical attention network, each attention-based fusion unit corresponding to a fusion operation and a combination of these attention units corresponding to a fusion structure. Through optimizing the combination of attention-based fusion units, we can dynamically select the fusion structure to adapt to various challenging scenarios. Unlike complex search of different structures in neural architecture search algorithms, we develop a dynamic routing algorithm, which equips each attention-based fusion unit with a router, to predict the combination weights for efficient optimization of the fusion structure. Extensive experiments on five mainstream RGBT tracking datasets demonstrate the superior performance of the proposed AFter against state-of-the-art RGBT trackers. We release the code in https://github.com/Alexadlu/AFter.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| rgb-t-tracking-on-gtot | AFter | Precision: 91.6 Success: 78.5 |
| rgb-t-tracking-on-lasher | AFter | Precision: 70.3 Success: 55.1 |
| rgb-t-tracking-on-rgbt210 | AFter | Precision: 87.6 Success: 63.5 |
| rgb-t-tracking-on-rgbt234 | AFter | Precision: 90.1 Success: 66.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.