Command Palette
Search for a command to run...
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng Alexander G. Schwing Alexander Kirillov

Abstract
Modern approaches typically formulate semantic segmentation as a per-pixel classification task, while instance-level segmentation is handled with an alternative mask classification. Our key insight: mask classification is sufficiently general to solve both semantic- and instance-level segmentation tasks in a unified manner using the exact same model, loss, and training procedure. Following this observation, we propose MaskFormer, a simple mask classification model which predicts a set of binary masks, each associated with a single global class label prediction. Overall, the proposed mask classification-based method simplifies the landscape of effective approaches to semantic and panoptic segmentation tasks and shows excellent empirical results. In particular, we observe that MaskFormer outperforms per-pixel classification baselines when the number of classes is large. Our mask classification-based method outperforms both current state-of-the-art semantic (55.6 mIoU on ADE20K) and panoptic segmentation (52.7 PQ on COCO) models.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| panoptic-segmentation-on-ade20k-val | MaskFormer (R101 + 6 Enc) | PQ: 35.7 |
| panoptic-segmentation-on-coco-minival | MaskFormer (single-scale) | PQ: 52.7 PQst: 44.0 PQth: 58.5 RQ: 63.5 SQ: 81.8 |
| panoptic-segmentation-on-coco-test-dev | MaskFormer (Swin-L) | PQ: 53.3 PQst: 44.5 PQth: 59.1 |
| semantic-segmentation-on-ade20k | MaskFormer(Swin-B) | Validation mIoU: 53.8 |
| semantic-segmentation-on-ade20k | MaskFormer(ResNet-101) | Validation mIoU: 48.1 |
| semantic-segmentation-on-ade20k-val | MaskFormer (Swin-L, ImageNet-22k pretrain) | mIoU: 55.6 |
| semantic-segmentation-on-mapillary-val | MaskFormer (ResNet-50) | mIoU: 55.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.