HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Abstract

Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code and models are available at https://github.com/yoctta/XPaste.

Code Repositories

yoctta/xpaste
Official
pytorch
Mentioned in GitHub
aim-uofa/DiverGen
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
instance-segmentation-on-coco-minivalCenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
mask AP: 48.8
instance-segmentation-on-lvis-v1-0-valCenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
mask AP: 45.4
mask APr: 43.8
object-detection-on-lvis-v1-0-valCenterNet2 (Swin-L w/ X-Paste + Copy-Paste)
box AP: 50.9
box APr: 48.7
open-vocabulary-object-detection-on-lvis-v1-0X-Paste
AP novel-LVIS base training: 21.4
AP novel-Unrestricted open-vocabulary training: 22.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion | Papers | HyperAI