HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

TubeDETR: Spatio-Temporal Video Grounding with Transformers

Yang Antoine ; Miech Antoine ; Sivic Josef ; Laptev Ivan ; Schmid Cordelia

TubeDETR: Spatio-Temporal Video Grounding with Transformers

Abstract

We consider the problem of localizing a spatio-temporal tube in a videocorresponding to a given text query. This is a challenging task that requiresthe joint and efficient modeling of temporal, spatial and multi-modalinteractions. To address this task, we propose TubeDETR, a transformer-basedarchitecture inspired by the recent success of such models for text-conditionedobject detection. Our model notably includes: (i) an efficient video and textencoder that models spatial multi-modal interactions over sparsely sampledframes and (ii) a space-time decoder that jointly performs spatio-temporallocalization. We demonstrate the advantage of our proposed components throughan extensive ablation study. We also evaluate our full approach on thespatio-temporal video grounding task and demonstrate improvements over thestate of the art on the challenging VidSTG and HC-STVG benchmarks. Code andtrained models are publicly available athttps://antoyang.github.io/tubedetr.html.

Code Repositories

antoyang/TubeDETR
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
spatio-temporal-video-grounding-on-hc-stvg1TubeDETR
m_vIoU: 32.4
vIoU@0.3: 49.8
vIoU@0.5: 23.5
spatio-temporal-video-grounding-on-hc-stvg2TubeDETR
Val m_vIoU: 36.4
Val vIoU@0.3: 58.8
Val vIoU@0.5: 30.6
spatio-temporal-video-grounding-on-vidstgTubeDETR
Declarative m_vIoU: 30.4
Declarative vIoU@0.3: 42.5
Declarative vIoU@0.5: 28.2
Interrogative m_vIoU: 25.7
Interrogative vIoU@0.3: 35.7
Interrogative vIoU@0.5: 23.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TubeDETR: Spatio-Temporal Video Grounding with Transformers | Papers | HyperAI