HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Kwanyoung Kim Yujin Oh Jong Chul Ye

OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation

Abstract

The recent success of CLIP has demonstrated promising results in zero-shot semantic segmentation by transferring muiltimodal knowledge to pixel-level classification. However, leveraging pre-trained CLIP knowledge to closely align text embeddings with pixel embeddings still has limitations in existing approaches. To address this issue, we propose OTSeg, a novel multimodal attention mechanism aimed at enhancing the potential of multiple text prompts for matching associated pixel embeddings. We first propose Multi-Prompts Sinkhorn (MPS) based on the Optimal Transport (OT) algorithm, which leads multiple text prompts to selectively focus on various semantic features within image pixels. Moreover, inspired by the success of Sinkformers in unimodal settings, we introduce the extension of MPS, called Multi-Prompts Sinkhorn Attention (MPSA) , which effectively replaces cross-attention mechanisms within Transformer framework in multimodal settings. Through extensive experiments, we demonstrate that OTSeg achieves state-of-the-art (SOTA) performance with significant gains on Zero-Shot Semantic Segmentation (ZS3) tasks across three benchmark datasets.

Code Repositories

cubeyoung/OTSeg
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-semantic-segmentation-on-coco-stuffOTSeg
Inductive Setting hIoU: 41.4
Transductive Setting hIoU: 49.5
zero-shot-semantic-segmentation-on-coco-stuffOTSeg+
Inductive Setting hIoU: 41.5
Transductive Setting hIoU: 49.8
zero-shot-semantic-segmentation-on-pascal-vocOTSeg
Inductive Setting hIoU: 84.5
Transductive Setting hIoU: 94.2
zero-shot-semantic-segmentation-on-pascal-vocOTSeg+
Inductive Setting hIoU: 87.4
Transductive Setting hIoU: 94.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation | Papers | HyperAI