HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Simiao Lai; Chang Liu; Jiawen Zhu; Ben Kang; Yang Liu; Dong Wang; Huchuan Lu

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

Abstract

Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture. Nonetheless, these methods mainly adopt imagepair appearance matching and face challenges of the intrinsic high quadratic complexity of the attention mechanism, resulting in constrained exploitation of temporal information. Inspired by the recently emerged State Space Model Mamba, renowned for its impressive long sequence modeling capabilities and linear computational complexity, this work innovatively proposes a pure Mamba-based framework (MambaVT) to fully exploit spatio-temporal contextual modeling for robust visible-thermal tracking. Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations, and introduce short-term historical trajectory prompts to predict the subsequent target states based on local temporal location clues. Extensive experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks while requiring lower computational costs. We aim for this work to serve as a simple yet strong baseline, stimulating future research in this field. The code and pre-trained models will be made available.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
rgb-t-tracking-on-gtotMambaVT-M256
Precision: 95.2
Success: 78.6
rgb-t-tracking-on-gtotMambaVT-S256
Precision: 94.1
Success: 75.3
rgb-t-tracking-on-lasherMambaVT-M256
Precision: 72.7
Success: 57.5
rgb-t-tracking-on-lasherMambaVT-S256
Precision: 73.0
Success: 57.9
rgb-t-tracking-on-rgbt210MambaVT-M256
Precision: 88.5
Success: 64.4
rgb-t-tracking-on-rgbt210MambaVT-S256
Precision: 88.0
Success: 63.7
rgb-t-tracking-on-rgbt234MambaVT-S256
Precision: 88.9
Success: 65.8
rgb-t-tracking-on-rgbt234MambaVT-M256
Precision: 90.7
Success: 67.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking | Papers | HyperAI