HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

TarViS: A Unified Approach for Target-based Video Segmentation

Ali Athar; Alexander Hermans; Jonathon Luiten; Deva Ramanan; Bastian Leibe

TarViS: A Unified Approach for Target-based Video Segmentation

Abstract

The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined 'targets' in video. Our approach is flexible with respect to how tasks define these targets, since it models the latter as abstract 'queries' which are then used to predict pixel-precise target masks. A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining. To demonstrate its effectiveness, we apply TarViS to four different tasks, namely Video Instance Segmentation (VIS), Video Panoptic Segmentation (VPS), Video Object Segmentation (VOS) and Point Exemplar-guided Tracking (PET). Our unified, jointly trained model achieves state-of-the-art performance on 5/7 benchmarks spanning these four tasks, and competitive performance on the remaining two. Code and model weights are available at: https://github.com/Ali2500/TarViS

Code Repositories

Ali2500/TarViS
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-instance-segmentation-on-ovis-1TarViS (ResNet-50)
AP50: 52.5
AP75: 30.4
AR1: 15.9
AR10: 39.9
mask AP: 31.1
video-instance-segmentation-on-ovis-1TarViS (Swin-L)
AP50: 67.8
AP75: 44.6
AR1: 18.0
AR10: 50.4
mask AP: 43.2
video-instance-segmentation-on-ovis-1TarViS (Swin-T)
AP50: 55.0
AP75: 34.4
AR1: 16.1
AR10: 40.9
mask AP: 34.0
video-instance-segmentation-on-youtube-vis-2TarViS (Swin-L)
AP50: 81.4
AP75: 67.6
AR1: 47.6
AR10: 64.8
mask AP: 60.2
video-instance-segmentation-on-youtube-vis-2TarViS (Swin-T)
AP50: 71.6
AP75: 56.6
AR1: 42.2
AR10: 57.2
mask AP: 50.9
video-instance-segmentation-on-youtube-vis-2TarViS (ResNet-50)
AP50: 69.6
AP75: 53.2
AR1: 40.5
AR10: 55.9
mask AP: 48.3
video-panoptic-segmentation-on-cityscapes-vpsTarViS (Swin-T)
VPQ: 58.0
VPQ (stuff): 69.0
VPQ (thing): 42.9
video-panoptic-segmentation-on-cityscapes-vpsTarViS (ResNet-50)
VPQ: 53.3
VPQ (stuff): 66.0
VPQ (thing): 35.9
video-panoptic-segmentation-on-cityscapes-vpsTarViS (Swin-L)
VPQ: 58.9
VPQ (stuff): 69.9
VPQ (thing): 43.7
video-panoptic-segmentation-on-kitti-stepTarViS (Swin-T)
AQ: 71.2
SQ: 69.9
STQ: 70.6
video-panoptic-segmentation-on-kitti-stepTarViS (Swin-L)
AQ: 72.0
SQ: 72.0
STQ: 73.0
video-panoptic-segmentation-on-kitti-stepTarViS (ResNet-50)
AQ: 70.3
SQ: 68.8
STQ: 69.6
video-panoptic-segmentation-on-vipsegTarViS (ResNet-50)
STQ: 43.1
VPQ: 33.5
video-panoptic-segmentation-on-vipsegTarViS (Swin-L)
STQ: 52.9
VPQ: 48.0
video-panoptic-segmentation-on-vipsegTarViS (Swin-T)
STQ: 45.3
VPQ: 35.8
visual-object-tracking-on-davis-2017TarViS
F-measure (Mean): 88.5
Ju0026F: 85.3
Jaccard (Mean): 81.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TarViS: A Unified Approach for Target-based Video Segmentation | Papers | HyperAI