| kMaX-DeepLab (single-scale) | 58.5 | 49.0 | 64.8 | kMaX-DeepLab: k-means Mask Transformer | |
| Panoptic SegFormer (PVTv2-B5) | 55.8 | 46.5 | 61.9 | Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers | |
| CMT-DeepLab (single-scale) | 55.7 | 46.8 | 61.6 | CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation | |
| K-Net (Swin-L) | 55.2 | 46.2 | 61.2 | K-Net: Towards Unified Image Segmentation | |
| MaskConver (ResNet50, single-scale) | 53.6 | 58.9 | 45.6 | MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation | |
| Panoptic FCN* (Swin-L) | 52.7 | - | 59.4 | Fully Convolutional Networks for Panoptic Segmentation | |
| REFINE (ResNeXt-101-DCN) | 51.5 | 39.2 | 59.6 | REFINE: Prediction Fusion Network for Panoptic Segmentation | - |
| MaX-DeepLab-L (single-scale) | 51.3 | 42.4 | 57.2 | MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers | |
| Panoptic SegFormer (ResNet-101) | 50.9 | 43.0 | 56.2 | Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers | |
| Panoptic SegFormer (ResNet-50) | 50.2 | 42.4 | 55.3 | Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers | |
| DetectoRS (ResNeXt-101-64x4d, multi-scale) | 50 | 37.2 | 58.5 | DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution | |
| REFINE (ResNet-101-DCN) | 49.6 | 37.7 | 57.5 | REFINE: Prediction Fusion Network for Panoptic Segmentation | - |
| Ada-Segment (ResNet-101-DCN) | 48.5 | 37.6 | 55.7 | Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation | - |
| SpatialFlow(ResNet-101-FPN) | 48.5 | 37.9 | 55.5 | SpatialFlow: Bridging All Tasks for Panoptic Segmentation | |
| K-Net (R101-FPN-DCN) | 48.3 | 39.7 | 54 | K-Net: Towards Unified Image Segmentation | |