HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Class-agnostic Object Detection with Multi-modal Transformer

Maaz Muhammad ; Rasheed Hanoona ; Khan Salman ; Khan Fahad Shahbaz ; Anwer Rao Muhammad ; Yang Ming-Hsuan

Class-agnostic Object Detection with Multi-modal Transformer

Abstract

What constitutes an object? This has been a long-standing question incomputer vision. Towards this goal, numerous learning-free and learning-basedapproaches have been developed to score objectness. However, they generally donot scale well across new domains and novel objects. In this paper, we advocatethat existing methods lack a top-down supervision signal governed byhuman-understandable semantics. For the first time in literature, wedemonstrate that Multi-modal Vision Transformers (MViT) trained with alignedimage-text pairs can effectively bridge this gap. Our extensive experimentsacross various domains and novel objects show the state-of-the-art performanceof MViTs to localize generic objects in images. Based on the observation thatexisting MViTs do not include multi-scale feature processing and usuallyrequire longer training schedules, we develop an efficient MViT architectureusing multi-scale deformable attention and late vision-language fusion. We showthe significance of MViT proposals in a diverse range of applications includingopen-world object detection, salient and camouflage object detection,supervised and self-supervised detection tasks. Further, MViTs can adaptivelygenerate proposals given a specific language query and thus offer enhancedinteractability. Code: \url{https://git.io/J1HPY}.

Code Repositories

mmaaz60/mvits_for_class_agnostic_od
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
object-detection-on-pascal-voc-10DETReg (MDef-DETR)
AP: 58.78
AP50: 80.46
AP75: 65.65
object-detection-on-pascal-voc-2007DETReg (MDef-DETR)
AP50: 84.16
MAP: 84.16%
object-proposal-generation-on-cocoMDef-DETR (Off-the-shelf evaluation)
Average Recall: 0.6503
object-proposal-generation-on-pascal-voc-2012MDef-DETR
Average Recall: 0.9126
open-world-object-detection-on-coco-2017ORE (MDef-DETR)
A-OSE: 5212
MAP: 46.19
Unknown Recall: 49.54
WI: 0.0251
open-world-object-detection-on-coco-2017-1ORE (MDef-DETR)
A-OSE: 4117
MAP: 36.75
Unknown Recall: 50.89
WI: 0.0179
open-world-object-detection-on-coco-2017-2ORE (MDef-DETR)
MAP: 31.66
open-world-object-detection-on-pascal-vocORE (MDef-DETR)
A-OSE: 7322
MAP: 64.03
Unknown Recall: 50.13
WI: 0.0474

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Class-agnostic Object Detection with Multi-modal Transformer | Papers | HyperAI