5 months ago

Detect Everything with Few Examples

Zhang Xinyu ; Liu Yuhan ; Wang Yuting ; Boularias Abdeslam

Abstract

Few-shot object detection aims at detecting novel categories given only a fewexample images. It is a basic skill for a robot to perform tasks in openenvironments. Recent methods focus on finetuning strategies, with complicatedprocedures that prohibit a wider application. In this paper, we introduceDE-ViT, a few-shot object detector without the need for finetuning. DE-ViT'snovel architecture is based on a new region-propagation mechanism forlocalization. The propagated region masks are transformed into bounding boxesthrough a learnable spatial integral layer. Instead of training prototypeclassifiers, we propose to use prototypes to project ViT features into asubspace that is robust to overfitting on base classes. We evaluate DE-ViT onfew-shot, and one-shot object detection benchmarks with Pascal VOC, COCO, andLVIS. DE-ViT establishes new state-of-the-art results on all benchmarks.Notably, for COCO, DE-ViT surpasses the few-shot SoTA by 15 mAP on 10-shot and7.2 mAP on 30-shot and one-shot SoTA by 2.8 AP50. For LVIS, DE-ViT outperformsfew-shot SoTA by 17 box APr. Further, we evaluate DE-ViT with a real robot bybuilding a pick-and-place system for sorting novel objects based on exampleimages. The videos of our robot demonstrations, the source code and the modelsof DE-ViT can be found at https://mlzxy.github.io/devit.

Code Repositories

mlzxy/devit

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
cross-domain-few-shot-object-detection-on	DE-ViT-FT	mAP: 49.2
cross-domain-few-shot-object-detection-on-1	DE-ViT-FT	mAP: 40.8
cross-domain-few-shot-object-detection-on-2	DE-ViT-FT	mAP: 25.6
cross-domain-few-shot-object-detection-on-3	DE-ViT-FT	mAP: 21.3
cross-domain-few-shot-object-detection-on-4	DE-ViT-FT	mAP: 5.4
cross-domain-few-shot-object-detection-on-neu	DE-ViT-FT	mAP: 8.8
few-shot-object-detection-on-ms-coco-10-shot	DE-ViT	AP: 34.0
few-shot-object-detection-on-ms-coco-30-shot	DE-ViT	AP: 34
one-shot-object-detection-on-coco	DE-ViT	AP 0.5: 28.4
open-vocabulary-object-detection-on-lvis-v1-0	DE-ViT	AP novel-LVIS base training: 34.3
open-vocabulary-object-detection-on-mscoco	DE-ViT	AP 0.5: 50

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette