Command Palette
Search for a command to run...
Wang Zhaoqing ; Lu Yu ; Li Qiang ; Tao Xunqiang ; Guo Yandong ; Gong Mingming ; Liu Tongliang

Abstract
Referring image segmentation aims to segment a referent via a naturallinguistic expression.Due to the distinct data properties between text andimage, it is challenging for a network to well align text and pixel-levelfeatures. Existing approaches use pretrained models to facilitate learning, yetseparately transfer the language/vision knowledge from pretrained models,ignoring the multi-modal corresponding information. Inspired by the recentadvance in Contrastive Language-Image Pretraining (CLIP), in this paper, wepropose an end-to-end CLIP-Driven Referring Image Segmentation framework(CRIS). To transfer the multi-modal knowledge effectively, CRIS resorts tovision-language decoding and contrastive learning for achieving thetext-to-pixel alignment. More specifically, we design a vision-language decoderto propagate fine-grained semantic information from textual representations toeach pixel-level activation, which promotes consistency between the twomodalities. In addition, we present text-to-pixel contrastive learning toexplicitly enforce the text feature similar to the related pixel-level featuresand dissimilar to the irrelevances. The experimental results on three benchmarkdatasets demonstrate that our proposed framework significantly outperforms thestate-of-the-art performance without any post-processing. The code will bereleased.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| generalized-referring-expression-segmentation | CRIS | cIoU: 55.34 gIoU: 56.27 |
| referring-expression-segmentation-on-refcoco | CRIS | Overall IoU: 70.47 |
| referring-expression-segmentation-on-refcoco-3 | CRIS | Overall IoU: 62.27 |
| referring-expression-segmentation-on-refcoco-4 | CRIS | Overall IoU: 68.08 |
| referring-expression-segmentation-on-refcoco-5 | CRIS | Overall IoU: 53.68 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.