HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Brendan Duke; Abdalla Ahmed; Christian Wolf; Parham Aarabi; Graham W. Taylor

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Abstract

In this paper we introduce a Transformer-based approach to video object segmentation (VOS). To address compounding error and scalability issues of prior work, we propose a scalable, end-to-end method for VOS called Sparse Spatiotemporal Transformers (SST). SST extracts per-pixel representations for each object in a video using sparse attention over spatiotemporal features. Our attention-based formulation for VOS allows a model to learn to attend over a history of multiple frames and provides suitable inductive bias for performing correspondence-like computations necessary for solving motion segmentation. We demonstrate the effectiveness of attention-based over recurrent networks in the spatiotemporal domain. Our method achieves competitive results on YouTube-VOS and DAVIS 2017 with improved scalability and robustness to occlusions compared with the state of the art. Code is available at https://github.com/dukebw/SSTVOS.

Code Repositories

dukebw/SSTVOS
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
semi-supervised-video-object-segmentation-on-20SSTVOS
D17 val (F): 81.4
D17 val (G): 78.4
D17 val (J): 75.4
video-object-segmentation-on-youtube-vos-1SST (Local)
Jaccard (Seen): 80.9
Jaccard (Unseen): 76.6
video-object-segmentation-on-youtube-vos-2019-2SST
Jaccard (Seen): 80.9
Jaccard (Unseen): 76.6
Mean Jaccard u0026 F-Measure: 81.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation | Papers | HyperAI