HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Carl Doersch Yi Yang Mel Vecerik Dilara Gokay Ankush Gupta Yusuf Aytar Joao Carreira Andrew Zisserman

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Abstract

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time, and can be flexibly extended to higher-resolution videos. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found on our project webpage.

Benchmarks

BenchmarkMethodologyMetrics
visual-tracking-on-davisTAPIR (Panning MOVi-E)
Average Jaccard: 61.3
visual-tracking-on-davisTAPIR (MOVi-E)
Average Jaccard: 59.8
visual-tracking-on-kineticsTAPIR (Panning MOVi-E)
Average Jaccard: 57.2
visual-tracking-on-kineticsTAPIR (MOVi-E)
Average Jaccard: 57.1
visual-tracking-on-kubricTAPIR (MOVi-E)
Average Jaccard: 84.3
visual-tracking-on-kubricTAPIR (Panning MOVi-E)
Average Jaccard: 84.7
visual-tracking-on-rgb-stackingTAPIR (MOVi-E)
Average Jaccard: 66.2
visual-tracking-on-rgb-stackingTAPIR (Panning MOVi-E)
Average Jaccard: 62.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement | Papers | HyperAI