Command Palette
Search for a command to run...
Zihang Lai Erika Lu Weidi Xie

Abstract
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods. We propose a dense tracking model trained on videos without any annotations that surpasses previous self-supervised methods on existing benchmarks by a significant margin (+15%), and achieves performance comparable to supervised methods. In this paper, we first reassess the traditional choices used for self-supervised training and reconstruction loss by conducting thorough experiments that finally elucidate the optimal choices. Second, we further improve on existing methods by augmenting our architecture with a crucial memory component. Third, we benchmark on large-scale semi-supervised video object segmentation(aka. dense tracking), and propose a new metric: generalizability. Our first two contributions yield a self-supervised network that for the first time is competitive with supervised methods on standard evaluation metrics of dense tracking. When measuring generalizability, we show self-supervised approaches are actually superior to the majority of supervised methods. We believe this new generalizability metric can better capture the real-world use-cases for dense tracking, and will spur new interest in this research direction.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-video-object-segmentation-on-4 | MAST | F-measure (Mean): 67.6 F-measure (Recall): 77.7 Ju0026F: 65.5 Jaccard (Mean): 63.3 Jaccard (Recall): 73.2 |
| visual-object-tracking-on-davis-2017 | MAST | F-measure (Mean): 67.6 F-measure (Recall): 77.7 Ju0026F: 65.5 Jaccard (Mean): 63.3 Jaccard (Recall): 73.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.