HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PosSAM: Panoptic Open-vocabulary Segment Anything

Vibashan VS; Shubhankar Borse; Hyojin Park; Debasmit Das; Vishal Patel; Munawar Hayat; Fatih Porikli

PosSAM: Panoptic Open-vocabulary Segment Anything

Abstract

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in generating spatially-aware masks, it's decoder falls short in recognizing object class information and tends to oversegment without additional guidance. Existing approaches address this limitation by using multi-stage techniques and employing separate models to generate class-aware prompts, such as bounding boxes or segmentation masks. Our proposed method, PosSAM is an end-to-end model which leverages SAM's spatially rich features to produce instance-aware masks and harnesses CLIP's semantically discriminative features for effective instance classification. Specifically, we address the limitations of SAM and propose a novel Local Discriminative Pooling (LDP) module leveraging class-agnostic SAM and class-aware CLIP features for unbiased open-vocabulary classification. Furthermore, we introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image. We conducted extensive experiments to demonstrate our methods strong generalization properties across multiple datasets, achieving state-of-the-art performance with substantial improvements over SOTA open-vocabulary panoptic segmentation methods. In both COCO to ADE20K and ADE20K to COCO settings, PosSAM outperforms the previous state-of-the-art methods by a large margin, 2.4 PQ and 4.6 PQ, respectively. Project Website: https://vibashan.github.io/possam-web/.

Code Repositories

Vibashan/PosSAM
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
open-vocabulary-panoptic-segmentation-onPosSAM
PQ: 29.2
open-vocabulary-semantic-segmentation-on-3PosSAM
mIoU: 14.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PosSAM: Panoptic Open-vocabulary Segment Anything | Papers | HyperAI