HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Shilong Liu; Zhaoyang Zeng; Tianhe Ren; Feng Li; Hao Zhang; Jie Yang; Qing Jiang; Chunyuan Li; Jianwei Yang; Hang Su; Jun Zhu; Lei Zhang

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Abstract

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a $52.5$ AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. It sets a new record on the ODinW zero-shot benchmark with a mean $26.1$ AP. Code will be available at \url{https://github.com/IDEA-Research/GroundingDINO}.

Code Repositories

idea-research/groundingdino
Official
pytorch
Mentioned in GitHub
longzw1997/Open-GroundingDino
pytorch
Mentioned in GitHub
IDEA-Research/Grounded-Segment-Anything
pytorch
Mentioned in GitHub
idea-research/grounded-sam-2
pytorch
Mentioned in GitHub
huggingface/transformers
pytorch
Mentioned in GitHub
idea-research/dino-x-api
Mentioned in GitHub
hzlbbfrog/generative-bim
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
object-detection-on-cocoGrounding DINO
box mAP: 63.0
object-detection-on-coco-minivalGrounding DINO
box AP: 63.0
object-detection-on-odinw-full-shot-13-tasksGrounding DINO
AP: 70.9
zero-shot-object-detection-on-lvis-v1-0GroundingDINO-L
AP: 33.9
zero-shot-object-detection-on-mscocoGrounding DINO-L (without COCO data)
AP: 52.5
zero-shot-object-detection-on-odinwGrounding DINO
Average Score: 26.1
zero-shot-segmentation-on-segmentation-in-theGrounded-SAM
Mean AP: 46.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | Papers | HyperAI