Tianhe RenQing JiangShilong LiuZhaoyang ZengWenlong LiuHan GaoHongjie HuangZhengyu MaXiaoke JiangYihao ChenYuda XiongHao ZhangFeng LiPeijun TangKent YuLei Zhang

摘要
本文介绍了由IDEA研究院开发的Grounding DINO 1.5,这是一套先进的开放集物体检测模型,旨在推动开放集物体检测领域的“前沿”发展。该套件包括两个模型:Grounding DINO 1.5 Pro,一个高性能模型,设计用于在广泛场景中具备更强的泛化能力;以及Grounding DINO 1.5 Edge,一个高效模型,针对许多需要边缘部署的应用进行了优化以实现更快的速度。Grounding DINO 1.5 Pro模型通过扩展模型架构、集成增强的视觉主干网络,并将训练数据集扩大到超过2000万张带有接地注释的图像,从而实现了更丰富的语义理解。尽管Grounding DINO 1.5 Edge模型为了提高效率而减少了特征尺度,但通过在同一全面的数据集上进行训练,仍保持了强大的检测能力。实证结果表明了Grounding DINO 1.5的有效性,其中Grounding DINO 1.5 Pro模型在COCO检测基准上达到了54.3 AP,在LVIS-minival零样本迁移基准上达到了55.7 AP,为开放集物体检测设立了新的记录。此外,当使用TensorRT进行优化时,Grounding DINO 1.5 Edge模型在LVIS-minival基准上的零样本性能达到36.2 AP的同时实现了75.2 FPS的速度,使其更适合边缘计算场景。该模型的示例和演示(含API)将在以下网址发布:https://github.com/IDEA-Research/Grounding-DINO-1.5-API
代码仓库
idea-research/grounded-sam-2
pytorch
GitHub 中提及
mit-han-lab/efficientvit
pytorch
GitHub 中提及
idea-research/grounding-dino-1.5-api
官方
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| few-shot-object-detection-on-odinw-13 | Grounding DINO 1.5 Pro | Average Score: 66.3 |
| few-shot-object-detection-on-odinw-35 | Grounding DINO 1.5 Pro | Average Score: 54.7 |
| object-detection-on-lvis-v1-0-minival | Grounding DINO 1.5 Pro | box AP: 68.1 |
| object-detection-on-lvis-v1-0-val | Grounding DINO 1.5 Pro | box AP: 63.5 box APr: 64.0 |
| object-detection-on-odinw-full-shot-13-tasks | Grounding DINO 1.5 Pro | AP: 72.4 |
| object-detection-on-odinw-full-shot-35-tasks | Grounding DINO 1.5 Pro | AP: 72.4 |
| zero-shot-object-detection-on-lvis-v1-0 | Grounding DINO 1.6 Pro (without LVIS data) | AP: 57.7 |
| zero-shot-object-detection-on-lvis-v1-0 | Grounding DINO 1.5 Pro (without LVIS data) | AP: 55.7 |
| zero-shot-object-detection-on-lvis-v1-0-val | Grounding DINO 1.6 Pro (without LVIS data) | AP: 51.1 |
| zero-shot-object-detection-on-lvis-v1-0-val | Grounding DINO 1.5 Pro (without LVIS data) | AP: 47.7 |
| zero-shot-object-detection-on-mscoco | Grounding DINO 1.5 Pro (without COCO data) | AP: 54.3 |
| zero-shot-object-detection-on-mscoco | Grounding DINO 1.6 Pro (without COCO data) | AP: 55.4 |
| zero-shot-object-detection-on-odinw | Grounding DINO 1.5 Pro | Average Score: 30.2 |