Command Palette
Search for a command to run...
Ridnik Tal ; Sharir Gilad ; Ben-Cohen Avi ; Ben-Baruch Emanuel ; Noy Asaf

Abstract
In this paper, we introduce ML-Decoder, a new attention-based classificationhead. ML-Decoder predicts the existence of class labels via queries, andenables better utilization of spatial data compared to global average pooling.By redesigning the decoder architecture, and using a novel group-decodingscheme, ML-Decoder is highly efficient, and can scale well to thousands ofclasses. Compared to using a larger backbone, ML-Decoder consistently providesa better speed-accuracy trade-off. ML-Decoder is also versatile - it can beused as a drop-in replacement for various classification heads, and generalizeto unseen classes when operated with word queries. Novel query augmentationsfurther improve its generalization ability. Using ML-Decoder, we achievestate-of-the-art results on several classification tasks: on MS-COCOmulti-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP;and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new topscore of 80.7%, without extra data or distillation. Public code is availableat: https://github.com/Alibaba-MIIL/ML_Decoder
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| fine-grained-image-classification-on-stanford | TResNet-L + ML-Decoder | Accuracy: 96.41% |
| image-classification-on-cifar-100 | Swin-L + ML-Decoder | Percentage correct: 95.1 |
| multi-label-classification-on-ms-coco | ML-Decoder(TResNet-XL, resolution 640) | mAP: 91.4 |
| multi-label-classification-on-ms-coco | ML-Decoder(TResNet-L, resolution 640) | mAP: 91.1 |
| multi-label-classification-on-openimages-v6 | TResNet-M | mAP: 86.8 |
| multi-label-zero-shot-learning-on-nus-wide | ML-Decoder | mAP: 31.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.