HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Wu Xiaoshi ; Zhu Feng ; Zhao Rui ; Li Hongsheng

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting
  and Anchor Pre-Matching

Abstract

Open-vocabulary detection (OVD) is an object detection task aiming atdetecting objects from novel categories beyond the base categories on which thedetector is trained. Recent OVD methods rely on large-scale visual-languagepre-trained models, such as CLIP, for recognizing novel objects. We identifythe two core obstacles that need to be tackled when incorporating these modelsinto detector training: (1) the distribution mismatch that happens whenapplying a VL-model trained on whole images to region recognition tasks; (2)the difficulty of localizing objects of unseen classes. To overcome theseobstacles, we propose CORA, a DETR-style framework that adapts CLIP forOpen-vocabulary detection by Region prompting and Anchor pre-matching. Regionprompting mitigates the whole-to-region distribution gap by prompting theregion features of the CLIP-based region classifier. Anchor pre-matching helpslearning generalizable object localization by a class-aware matching mechanism.We evaluate CORA on the COCO OVD benchmark, where we achieve 41.7 AP50 on novelclasses, which outperforms the previous SOTA by 2.4 AP50 even without resortingto extra training data. When extra training data is available, we trainCORA$^+$ on both ground-truth base-category annotations and additional pseudobounding box labels computed by CORA. CORA$^+$ achieves 43.1 AP50 on the COCOOVD benchmark and 28.1 box APr on the LVIS OVD benchmark.

Code Repositories

tgxs002/cora
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
described-object-detection-on-descriptionCORA-R50
Intra-scenario ABS mAP: 5.0
Intra-scenario FULL mAP: 6.2
Intra-scenario PRES mAP: 6.7
open-vocabulary-object-detection-on-mscocoCORA
AP 0.5: 41.7
open-vocabulary-object-detection-on-mscocoCORA+
AP 0.5: 43.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching | Papers | HyperAI