| DHR (Swin-L, Mask2Former) | 56.8 | DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation | |
| SemPLeS (Swin-L) | 56.1 | Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation | |
| WSSS-SAM(DeepLabV2-ResNet101) | 55.6 | An Alternative to WSSS? An Empirical Study of the Segment Anything Model (SAM) on Weakly-Supervised Semantic Segmentation Problems | |
| FMA-WSSS (Swin-L) | 55.4 | Foundation Model Assisted Weakly Supervised Semantic Segmentation | |
| CoSA (SWIN-B, multi-stage) | 53.7 | Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation | |
| CoSA (ViT-B, single-stage) | 51.1 | Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation | |
| WeakTr (ViT-S, multi-stage) | 50.3 | WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation | |
| MARS (ResNet-101, multi-stage) | 49.4 | MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation | |
| WeakTr (DeiT-S, multi-stage) | 46.9 | WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation | |
| RS+EPM (ResNet-101, multi-stage) | 46.4 | RecurSeed and EdgePredictMix: Pseudo-Label Refinement Learning for Weakly Supervised Semantic Segmentation across Single- and Multi-Stage Frameworks | |
| T2MDiffusion(DeepLabV2-ResNet101) | 45.7 | From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models | |
| FBR | 45.6 | Fine-grained Background Representation for Weakly Supervised Semantic Segmentation | |
| CLIP-ES(DeepLabV2-ResNet101) | 45.4 | CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation | |
| ACR(DeeplabV1-ResNet38) | 45.3 | Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor | - |
| BECO(DeepLabV3Plus+R101) | 45.1 | Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation | - |
| ACR-WSSS(DeepLabV2-ResNet101) | 45.0 | All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation | |
| ViT-PCM | 45.0 | Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation | |
| AMN (DeepLabV2-ResNet101) | 44.7 | Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds | |
| L2G (DeepLabV2-ResNet101) | 44.2 | L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation | |
| RIB (DeepLabV2-ResNet101, No Saliency) | 43.8 | Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation | |