HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

BootsTAP: Bootstrapped Training for Tracking-Any-Point

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Abstract

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/

Code Repositories

deepmind/tapnet
jax
Mentioned in GitHub
google-deepmind/tapnet
Official
jax
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
point-tracking-on-tap-vid-davisBootsTAPIR
Average Jaccard: 66.2
Average PCK: 78.1
Occlusion Accuracy: 91
point-tracking-on-tap-vid-kineticsBootsTAPIR
Average Jaccard: 61.4
Average PCK: 74.2
Occlusion Accuracy: 89.7
point-tracking-on-tap-vid-rgb-stackingBootsTAPIR
Average Jaccard: 72.4
Average PCK: 83.1
Occlusion Accuracy: 91.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BootsTAP: Bootstrapped Training for Tracking-Any-Point | Papers | HyperAI