Command Palette
Search for a command to run...
Seunghun Lee; Jiwan Seo; Kiljoon Han; Minwoo Choi; Sunghoon Im

Abstract
In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy. Additionally, we introduce the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames, thereby significantly enhancing instance matching accuracy. CAVIS demonstrates superior performance over state-of-the-art methods on all benchmark datasets in video instance segmentation (VIS) and video panoptic segmentation (VPS). Notably, our method excels on the OVIS dataset, which is known for its particularly challenging videos.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| video-instance-segmentation-on-ovis-1 | CAVIS(VIT-L, Offline) | AP50: 82.6 AP75: 63.5 AR1: 21.2 AR10: 61.8 mask AP: 57.1 |
| video-instance-segmentation-on-youtube-vis-2 | CAVIS(VIT-L, Offline) | AP50: 87.3 AP75: 73.2 AR1: 49.7 AR10: 70.3 mask AP: 65.3 |
| video-instance-segmentation-on-youtube-vis-3 | CAVIS (VIT-L) | mAP_L: 48.6 |
| video-panoptic-segmentation-on-vipseg | CAVIS(VIT-L) | STQ: 56.1 VPQ: 58.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.