Command Palette
Search for a command to run...
Yan Bin ; Jiang Yi ; Wu Jiannan ; Wang Dong ; Luo Ping ; Yuan Zehuan ; Lu Huchuan

Abstract
All instance perception tasks aim at finding certain objects specified bysome queries such as category names, language expressions, and targetannotations, but this complete field has been split into multiple independentsubtasks. In this work, we present a universal instance perception model of thenext generation, termed UNINEXT. UNINEXT reformulates diverse instanceperception tasks into a unified object discovery and retrieval paradigm and canflexibly perceive different types of objects by simply changing the inputprompts. This unified formulation brings the following benefits: (1) enormousdata from different tasks and label vocabularies can be exploited for jointlytraining general instance-level representations, which is especially beneficialfor tasks lacking in training data. (2) the unified model isparameter-efficient and can save redundant computation when handling multipletasks simultaneously. UNINEXT shows superior performance on 20 challengingbenchmarks from 10 instance-level tasks including classical image-level tasks(object detection and instance segmentation), vision-and-language tasks(referring expression comprehension and segmentation), and six video-levelobject tracking tasks. Code is available athttps://github.com/MasterBin-IIAU/UNINEXT.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| described-object-detection-on-description | UNINEXT-large | Intra-scenario ABS mAP: 15.9 Intra-scenario FULL mAP: 17.9 Intra-scenario PRES mAP: 18.6 |
| generalized-referring-expression | UNINEXT | N-acc.: 50.6 Precision@(F1=1, IoU≥0.5): 58.2 |
| instance-segmentation-on-coco | UNINEXT-H | AP50: 76.2 AP75: 56.7 APL: 67.5 APM: 55.9 APS: 33.3 mask AP: 51.8 |
| multi-object-tracking-and-segmentation-on-3 | UNINEXT-H | mMOTSA: 35.7 |
| multiple-object-tracking-on-bdd100k-val | UNINEXT-H | AssocA: - TETA: - mIDF1: 56.7 mMOTA: 44.2 |
| object-detection-on-coco-minival | UNINEXT-H | AP50: 77.5 AP75: 66.7 APL: 75.3 APM: 64.8 APS: 45.1 box AP: 60.6 |
| referring-expression-segmentation-on-davis | UNINEXT-H | Ju0026F 1st frame: 72.5 |
| referring-expression-segmentation-on-refcoco | UNINEXT-H | Overall IoU: 82.19 |
| referring-expression-segmentation-on-refcoco-3 | UNINEXT-H | Overall IoU: 72.47 |
| referring-expression-segmentation-on-refcoco-4 | UNINEXT-H | Overall IoU: 76.42 |
| referring-expression-segmentation-on-refcoco-5 | UNINEXT-H | Overall IoU: 66.22 |
| referring-expression-segmentation-on-refer-1 | UNINEXT-H | F: 72.7 J: 67.6 Ju0026F: 70.1 |
| video-instance-segmentation-on-ovis-1 | UNINEXT (ViT-H, Online) | AP50: 72.5 AP75: 52.2 mask AP: 49.0 |
| video-instance-segmentation-on-ovis-1 | UNINEXT (ResNet-50, Online) | AP50: 55.5 AP75: 35.6 mask AP: 34.0 |
| visual-object-tracking-on-lasot | UNINEXT-L | AUC: 72.4 Normalized Precision: 80.7 Precision: 78.9 |
| visual-object-tracking-on-lasot | UNINEXT-H | AUC: 72.2 Normalized Precision: 80.8 Precision: 79.4 |
| visual-object-tracking-on-lasot-ext | UNINEXT-H | AUC: 56.2 Normalized Precision: 63.8 Precision: 63.8 |
| visual-object-tracking-on-trackingnet | UNINEXT-H | Accuracy: 85.4 Normalized Precision: 89.0 Precision: 86.4 |
| visual-tracking-on-tnl2k | UNINEXT-H | AUC: 59.3 precision: 62.8 |
| zero-shot-segmentation-on-segmentation-in-the | UNINEXT | Mean AP: 42.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.