
摘要
所有实例感知任务的目标都是找到由某些查询(如类别名称、语言表达和目标注释)指定的特定对象,但这一完整的领域已被划分为多个独立的子任务。在本研究中,我们提出了一种下一代的通用实例感知模型,称为UNINEXT。UNINEXT将多样化的实例感知任务重新表述为一个统一的对象发现和检索范式,通过简单更改输入提示即可灵活感知不同类型的对象。这种统一的表述方式带来了以下好处:(1) 可以利用来自不同任务和标签词汇的大量数据进行联合训练,生成通用的实例级表示,这尤其有利于缺乏训练数据的任务。(2) 统一模型具有参数高效性,在同时处理多个任务时可以节省冗余计算。UNINEXT在包括经典图像级任务(物体检测和实例分割)、视觉-语言任务(指代表达理解与分割)以及六个视频级对象跟踪任务在内的10个实例级任务中的20个具有挑战性的基准测试上表现出色。代码可在以下地址获取:https://github.com/MasterBin-IIAU/UNINEXT。
代码仓库
MasterBin-IIAU/UNINEXT
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| described-object-detection-on-description | UNINEXT-large | Intra-scenario ABS mAP: 15.9 Intra-scenario FULL mAP: 17.9 Intra-scenario PRES mAP: 18.6 |
| generalized-referring-expression | UNINEXT | N-acc.: 50.6 Precision@(F1=1, IoU≥0.5): 58.2 |
| instance-segmentation-on-coco | UNINEXT-H | AP50: 76.2 AP75: 56.7 APL: 67.5 APM: 55.9 APS: 33.3 mask AP: 51.8 |
| multi-object-tracking-and-segmentation-on-3 | UNINEXT-H | mMOTSA: 35.7 |
| multiple-object-tracking-on-bdd100k-val | UNINEXT-H | AssocA: - TETA: - mIDF1: 56.7 mMOTA: 44.2 |
| object-detection-on-coco-minival | UNINEXT-H | AP50: 77.5 AP75: 66.7 APL: 75.3 APM: 64.8 APS: 45.1 box AP: 60.6 |
| referring-expression-segmentation-on-davis | UNINEXT-H | Ju0026F 1st frame: 72.5 |
| referring-expression-segmentation-on-refcoco | UNINEXT-H | Overall IoU: 82.19 |
| referring-expression-segmentation-on-refcoco-3 | UNINEXT-H | Overall IoU: 72.47 |
| referring-expression-segmentation-on-refcoco-4 | UNINEXT-H | Overall IoU: 76.42 |
| referring-expression-segmentation-on-refcoco-5 | UNINEXT-H | Overall IoU: 66.22 |
| referring-expression-segmentation-on-refer-1 | UNINEXT-H | F: 72.7 J: 67.6 Ju0026F: 70.1 |
| video-instance-segmentation-on-ovis-1 | UNINEXT (ViT-H, Online) | AP50: 72.5 AP75: 52.2 mask AP: 49.0 |
| video-instance-segmentation-on-ovis-1 | UNINEXT (ResNet-50, Online) | AP50: 55.5 AP75: 35.6 mask AP: 34.0 |
| visual-object-tracking-on-lasot | UNINEXT-L | AUC: 72.4 Normalized Precision: 80.7 Precision: 78.9 |
| visual-object-tracking-on-lasot | UNINEXT-H | AUC: 72.2 Normalized Precision: 80.8 Precision: 79.4 |
| visual-object-tracking-on-lasot-ext | UNINEXT-H | AUC: 56.2 Normalized Precision: 63.8 Precision: 63.8 |
| visual-object-tracking-on-trackingnet | UNINEXT-H | Accuracy: 85.4 Normalized Precision: 89.0 Precision: 86.4 |
| visual-tracking-on-tnl2k | UNINEXT-H | AUC: 59.3 precision: 62.8 |
| zero-shot-segmentation-on-segmentation-in-the | UNINEXT | Mean AP: 42.1 |