HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Qin Liu Zhenlin Xu Gedas Bertasius Marc Niethammer

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Abstract

Click-based interactive image segmentation aims at extracting objects with a limited user clicking. A hierarchical backbone is the de-facto architecture for current methods. Recently, the plain, non-hierarchical Vision Transformer (ViT) has emerged as a competitive backbone for dense prediction tasks. This design allows the original ViT to be a foundation model that can be finetuned for downstream tasks without redesigning a hierarchical backbone for pretraining. Although this design is simple and has been proven effective, it has not yet been explored for interactive image segmentation. To fill this gap, we propose SimpleClick, the first interactive segmentation method that leverages a plain backbone. Based on the plain backbone, we introduce a symmetric patch embedding layer that encodes clicks into the backbone with minor modifications to the backbone itself. With the plain backbone pretrained as a masked autoencoder (MAE), SimpleClick achieves state-of-the-art performance. Remarkably, our method achieves 4.15 NoC@90 on SBD, improving 21.8% over the previous best result. Extensive evaluation on medical images demonstrates the generalizability of our method. We further develop an extremely tiny ViT backbone for SimpleClick and provide a detailed computational analysis, highlighting its suitability as a practical annotation tool.

Code Repositories

uncbiag/simpleclick
Official
pytorch
Mentioned in GitHub
yihanhu-2022/diffmatte
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
interactive-segmentation-on-berkeleySimpleClick (ViT-H, C+L)
NoC@90: 1.75
interactive-segmentation-on-berkeleySimpleClick (ViT-H, SBD)
NoC@90: 2.09
interactive-segmentation-on-davisSimpleClick (ViT-H, SBD)
NoC@85: 4.20
NoC@90: 5.34
interactive-segmentation-on-davisSimpleClick (ViT-H, C+L)
NoC@85: 3.41
NoC@90: 4.70
interactive-segmentation-on-grabcutSimpleClick (ViT-L, C+L)
NoC@85: 1.32
NoC@90: 1.40
interactive-segmentation-on-grabcutSimpleClick (ViT-H, SBD)
NoC@85: 1.32
NoC@90: 1.44
interactive-segmentation-on-sbdSimpleClick
NoC@85: 2.51
NoC@90: 4.15

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers | Papers | HyperAI