Command Palette
Search for a command to run...
Zhang Xinyu ; Liu Yuhan ; Wang Yuting ; Boularias Abdeslam

Abstract
Few-shot object detection aims at detecting novel categories given only a fewexample images. It is a basic skill for a robot to perform tasks in openenvironments. Recent methods focus on finetuning strategies, with complicatedprocedures that prohibit a wider application. In this paper, we introduceDE-ViT, a few-shot object detector without the need for finetuning. DE-ViT'snovel architecture is based on a new region-propagation mechanism forlocalization. The propagated region masks are transformed into bounding boxesthrough a learnable spatial integral layer. Instead of training prototypeclassifiers, we propose to use prototypes to project ViT features into asubspace that is robust to overfitting on base classes. We evaluate DE-ViT onfew-shot, and one-shot object detection benchmarks with Pascal VOC, COCO, andLVIS. DE-ViT establishes new state-of-the-art results on all benchmarks.Notably, for COCO, DE-ViT surpasses the few-shot SoTA by 15 mAP on 10-shot and7.2 mAP on 30-shot and one-shot SoTA by 2.8 AP50. For LVIS, DE-ViT outperformsfew-shot SoTA by 17 box APr. Further, we evaluate DE-ViT with a real robot bybuilding a pick-and-place system for sorting novel objects based on exampleimages. The videos of our robot demonstrations, the source code and the modelsof DE-ViT can be found at https://mlzxy.github.io/devit.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| cross-domain-few-shot-object-detection-on | DE-ViT-FT | mAP: 49.2 |
| cross-domain-few-shot-object-detection-on-1 | DE-ViT-FT | mAP: 40.8 |
| cross-domain-few-shot-object-detection-on-2 | DE-ViT-FT | mAP: 25.6 |
| cross-domain-few-shot-object-detection-on-3 | DE-ViT-FT | mAP: 21.3 |
| cross-domain-few-shot-object-detection-on-4 | DE-ViT-FT | mAP: 5.4 |
| cross-domain-few-shot-object-detection-on-neu | DE-ViT-FT | mAP: 8.8 |
| few-shot-object-detection-on-ms-coco-10-shot | DE-ViT | AP: 34.0 |
| few-shot-object-detection-on-ms-coco-30-shot | DE-ViT | AP: 34 |
| one-shot-object-detection-on-coco | DE-ViT | AP 0.5: 28.4 |
| open-vocabulary-object-detection-on-lvis-v1-0 | DE-ViT | AP novel-LVIS base training: 34.3 |
| open-vocabulary-object-detection-on-mscoco | DE-ViT | AP 0.5: 50 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.