HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

M3TR: Multi-modal Multi-label Recognition with Transformer

{Jia Li Yifan Zhao Jiawei Zhao}

Abstract

Multi-label image recognition aims to recognize multiple objects simultaneously in one image. Recent ideas to solve this problem have focused on learning dependencies of label co-occurrences to enhance the high-level semantic representations. However, these methods usually neglect the important relations of intrinsic visual structures and face difficulties in understanding contextual relationships. To build the global scope of visual context as well as interactions between visual modality and linguistic modality, we propose the Multi-Modal Multi-label recognition TRansformers (M3TR) with the ternary relationship learning for inter-and intra-modalities. For the intra-modal relationship, we make insightful conjunction of CNNs and Transformers, which embeds visual structures into the high-level features by learning the semantic cross-attention. For constructing the interactions between the visual and linguistic modalities, we propose a linguistic cross-attention to embed the class-wise linguistic information into the visual structure learning, and finally present a linguistic guided enhancement module to enhance the representation of high-level semantics. Experimental evidence reveals that with the collaborative learning of ternary relationship, our proposed M3TR achieves new state-of-the-art on two public multi-label recognition benchmarks.

Benchmarks

BenchmarkMethodologyMetrics
multi-label-classification-on-ms-cocoM3TR(ImageNet-21K-P pretraining, resolution 448)
mAP: 87.5
multi-label-classification-on-pascal-voc-2007M3TR(448×448)
mAP: 96.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp