Command Palette
Search for a command to run...
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
Jiale Cao; Rao Muhammad Anwer; Hisham Cholakkal; Fahad Shahbaz Khan; Yanwei Pang; Ling Shao

Abstract
Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| instance-segmentation-on-coco | SipMask (ResNet-101, single-scale test) | AP50: 60.2 AP75: 40.8 APL: 54.3 APM: 40.8 APS: 17.8 mask AP: 38.1 |
| real-time-instance-segmentation-on-mscoco | SipMask++ (ResNet-101, single-scale test) | AP50: 55.6 AP75: 37.6 APL: 56.8 APM: 38.3 APS: 11.2 Frame (fps): 27.0 (Titan Xp) mask AP: 35.4 |
| real-time-instance-segmentation-on-mscoco | SipMask (ResNet-50, single-scale test) | AP50: 51.9 AP75: 32.3 APL: 49.8 APM: 33.6 APS: 9.2 Frame (fps): 41.7 (Titan Xp) mask AP: 31.2 |
| real-time-instance-segmentation-on-mscoco | SipMask (ResNet-101, single-scale test) | AP50: 53.4 AP75: 34.3 APL: 54.0 APM: 35.6 APS: 9.3 Frame (fps): 31.3 (Titan Xp) mask AP: 32.8 |
| video-instance-segmentation-on-youtube-vis-1 | SipMask (ResNet-50, single-scale test) | AP50: 53 AP75: 33.3 AR1: 33.5 AR10: 38.9 mask AP: 32.5 |
| video-instance-segmentation-on-youtube-vis-1 | SipMask (ResNet-50, ms-train, single-scale test) | AP50: 54.1 AP75: 35.8 AR1: 35.4 AR10: 40.1 mask AP: 33.7 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.