5 months ago

Universal Instance Perception as Object Discovery and Retrieval

Yan Bin ; Jiang Yi ; Wu Jiannan ; Wang Dong ; Luo Ping ; Yuan Zehuan ; Lu Huchuan

Abstract

All instance perception tasks aim at finding certain objects specified bysome queries such as category names, language expressions, and targetannotations, but this complete field has been split into multiple independentsubtasks. In this work, we present a universal instance perception model of thenext generation, termed UNINEXT. UNINEXT reformulates diverse instanceperception tasks into a unified object discovery and retrieval paradigm and canflexibly perceive different types of objects by simply changing the inputprompts. This unified formulation brings the following benefits: (1) enormousdata from different tasks and label vocabularies can be exploited for jointlytraining general instance-level representations, which is especially beneficialfor tasks lacking in training data. (2) the unified model isparameter-efficient and can save redundant computation when handling multipletasks simultaneously. UNINEXT shows superior performance on 20 challengingbenchmarks from 10 instance-level tasks including classical image-level tasks(object detection and instance segmentation), vision-and-language tasks(referring expression comprehension and segmentation), and six video-levelobject tracking tasks. Code is available athttps://github.com/MasterBin-IIAU/UNINEXT.

Code Repositories

MasterBin-IIAU/UNINEXT

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
described-object-detection-on-description	UNINEXT-large	Intra-scenario ABS mAP: 15.9 Intra-scenario FULL mAP: 17.9 Intra-scenario PRES mAP: 18.6
generalized-referring-expression	UNINEXT	N-acc.: 50.6 Precision@(F1=1, IoU≥0.5): 58.2
instance-segmentation-on-coco	UNINEXT-H	AP50: 76.2 AP75: 56.7 APL: 67.5 APM: 55.9 APS: 33.3 mask AP: 51.8
multi-object-tracking-and-segmentation-on-3	UNINEXT-H	mMOTSA: 35.7
multiple-object-tracking-on-bdd100k-val	UNINEXT-H	AssocA: - TETA: - mIDF1: 56.7 mMOTA: 44.2
object-detection-on-coco-minival	UNINEXT-H	AP50: 77.5 AP75: 66.7 APL: 75.3 APM: 64.8 APS: 45.1 box AP: 60.6
referring-expression-segmentation-on-davis	UNINEXT-H	Ju0026F 1st frame: 72.5
referring-expression-segmentation-on-refcoco	UNINEXT-H	Overall IoU: 82.19
referring-expression-segmentation-on-refcoco-3	UNINEXT-H	Overall IoU: 72.47
referring-expression-segmentation-on-refcoco-4	UNINEXT-H	Overall IoU: 76.42
referring-expression-segmentation-on-refcoco-5	UNINEXT-H	Overall IoU: 66.22
referring-expression-segmentation-on-refer-1	UNINEXT-H	F: 72.7 J: 67.6 Ju0026F: 70.1
video-instance-segmentation-on-ovis-1	UNINEXT (ViT-H, Online)	AP50: 72.5 AP75: 52.2 mask AP: 49.0
video-instance-segmentation-on-ovis-1	UNINEXT (ResNet-50, Online)	AP50: 55.5 AP75: 35.6 mask AP: 34.0
visual-object-tracking-on-lasot	UNINEXT-L	AUC: 72.4 Normalized Precision: 80.7 Precision: 78.9
visual-object-tracking-on-lasot	UNINEXT-H	AUC: 72.2 Normalized Precision: 80.8 Precision: 79.4
visual-object-tracking-on-lasot-ext	UNINEXT-H	AUC: 56.2 Normalized Precision: 63.8 Precision: 63.8
visual-object-tracking-on-trackingnet	UNINEXT-H	Accuracy: 85.4 Normalized Precision: 89.0 Precision: 86.4
visual-tracking-on-tnl2k	UNINEXT-H	AUC: 59.3 precision: 62.8
zero-shot-segmentation-on-segmentation-in-the	UNINEXT	Mean AP: 42.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette