Command Palette
Search for a command to run...
GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
Yao Ruijie ; Jin Sheng ; Xu Lumin ; Zeng Wang ; Liu Wentao ; Qian Chen ; Luo Ping ; Wu Ji

Abstract
Multi-Label Image Recognition (MLIR) is a challenging task that aims topredict multiple object labels in a single image while modeling the complexrelationships between labels and image regions. Although convolutional neuralnetworks and vision transformers have succeeded in processing images as regulargrids of pixels or patches, these representations are sub-optimal for capturingirregular and discontinuous regions of interest. In this work, we present thefirst fully graph convolutional model, Group K-nearest neighbor based Graphconvolutional Network (GKGNet), which models the connections between semanticlabel embeddings and image patches in a flexible and unified graph structure.To address the scale variance of different objects and to capture informationfrom multiple perspectives, we propose the Group KGCN module for dynamic graphconstruction and message passing. Our experiments demonstrate that GKGNetachieves state-of-the-art performance with significantly lower computationalcosts on the challenging multi-label datasets, i.e., MS-COCO and VOC2007datasets. Codes are available at https://github.com/jin-s13/GKGNet.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multi-label-classification-on-ms-coco | GKGNet(resolution 576) | mAP: 87.7 |
| multi-label-classification-on-ms-coco | GKGNet(resolution 448) | mAP: 86.7 |
| multi-label-classification-on-ms-coco | GKGNet(resolution 224) | mAP: 82 |
| multi-label-classification-on-pascal-voc-2007 | GKGNet | mAP: 96.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.