HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MogaNet: Multi-order Gated Aggregation Network

Siyuan Li Zedong Wang Zicheng Liu Cheng Tan Haitao Lin Di Wu Zhiyuan Chen Jiangbin Zheng Stan Z. Li

MogaNet: Multi-order Gated Aggregation Network

Abstract

By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-imagenetMogaNet-XT (256res)
GFLOPs: 1.04
Number of params: 3M
Top 1 Accuracy: 77.2%
image-classification-on-imagenetMogaNet-L
GFLOPs: 15.9
Number of params: 83M
Top 1 Accuracy: 84.7%
image-classification-on-imagenetMogaNet-S
GFLOPs: 5
Number of params: 25M
Top 1 Accuracy: 83.4%
image-classification-on-imagenetMogaNet-T (256res)
GFLOPs: 1.44
Number of params: 5.2M
Top 1 Accuracy: 80%
image-classification-on-imagenetMogaNet-B
GFLOPs: 9.9
Number of params: 44M
Top 1 Accuracy: 84.3%
image-classification-on-imagenetMogaNet-XL (384res)
GFLOPs: 102
Number of params: 181M
Top 1 Accuracy: 87.8%
instance-segmentation-on-cocoMogaNet-B (Cascade Mask R-CNN)
mask AP: 46
instance-segmentation-on-cocoMogaNet-T
mask AP: 35.8
instance-segmentation-on-cocoMogaNet-B (Mask R-CNN 1x)
mask AP: 43.2
instance-segmentation-on-cocoMogaNet-L (Cascade Mask R-CNN)
mask AP: 46.1
instance-segmentation-on-cocoMogaNet-S (Mask R-CNN 1x)
mask AP: 42.2
instance-segmentation-on-cocoMogaNet-S (Cascade Mask R-CNN)
mask AP: 45.1
instance-segmentation-on-cocoMogaNet-L (Mask R-CNN 1x)
mask AP: 44.1
instance-segmentation-on-cocoMogaNet-XT
mask AP: 37.6
instance-segmentation-on-cocoMogaNet-XL (Cascade Mask R-CNN)
mask AP: 48.8
instance-segmentation-on-cocoMogaNet-T (Mask R-CNN 1x)
mask AP: 39.1
instance-segmentation-on-coco-val2017MogaNet-S (256x192)
AP50: 90.7
AP75: 82.8
object-detection-on-coco-2017-valMogaNet-XL (Cascade Mask R-CNN)
AP: 56.2
object-detection-on-coco-2017-valMogaNet-S (RetinaNet 1x)
AP: 45.8
object-detection-on-coco-2017-valMogaNet-B (Cascade Mask R-CNN)
AP: 52.6
object-detection-on-coco-2017-valMogaNet-L (Mask R-CNN 1x)
AP: 49.4
object-detection-on-coco-2017-valMogaNet-S (Mask R-CNN 1x)
AP: 46.7
object-detection-on-coco-2017-valMogaNet-B (RetinaNet 1x)
AP: 47.7
object-detection-on-coco-2017-valMogaNet-L (Cascade Mask R-CNN)
AP: 53.3
object-detection-on-coco-2017-valMogaNet-XT (RetinaNet 1x)
AP: 39.7
object-detection-on-coco-2017-valMogaNet-L (RetinaNet 1x)
AP: 48.7
object-detection-on-coco-2017-valMogaNet-XT (Mask R-CNN 1x)
AP: 40.7
object-detection-on-coco-2017-valMogaNet-T (Mask R-CNN 1x)
AP: 42.6
object-detection-on-coco-2017-valMogaNet-B (Mask R-CNN 1x)
AP: 47.9
object-detection-on-coco-2017-valMogaNet-T (RetinaNet 1x)
AP: 41.4
object-detection-on-coco-2017-valMogaNet-S (Cascade Mask R-CNN)
AP: 51.6
pose-estimation-on-coco-val2017MogaNet-S (256x192)
AP: 74.9
AR: 80.1
pose-estimation-on-coco-val2017MogaNet-T (256x192)
AP: 73.2
AP50: 90.1
AP75: 81
AR: 78.8
pose-estimation-on-coco-val2017MogaNet-B (384x288)
AP: 77.3
AP50: 91.4
AP75: 84
AR: 82.2
pose-estimation-on-coco-val2017MogaNet-S (384x288)
AP: 76.4
AP50: 91
AP75: 83.3
AR: 81.4
semantic-segmentation-on-ade20kMogaNet-B (UperNet)
GFLOPs (512 x 512): 1050
Validation mIoU: 50.1
semantic-segmentation-on-ade20kMogaNet-L (UperNet)
GFLOPs (512 x 512): 1176
Validation mIoU: 50.9
semantic-segmentation-on-ade20kMogaNet-S (Semantic FPN)
GFLOPs (512 x 512): 189
Validation mIoU: 47.7
semantic-segmentation-on-ade20kMogaNet-S (UperNet)
GFLOPs (512 x 512): 946
Validation mIoU: 49.2
semantic-segmentation-on-ade20kMogaNet-XL (UperNet)
Validation mIoU: 54
video-prediction-on-moving-mnistVAN (SimVP 10x)
MAE: 53.57
MSE: 16.21
SSIM: 0.9646
video-prediction-on-moving-mnistSwin (SimVP 10x)
MAE: 59.84
MSE: 19.11
video-prediction-on-moving-mnistConvMixer (SimVP 10x)
MAE: 67.37
MSE: 22.3
video-prediction-on-moving-mnistUniformer (SimVP 10x)
MAE: 57.52
MSE: 18.01
video-prediction-on-moving-mnistMLP-Mixer (SimVP 10x)
MAE: 59.86
MSE: 18.85
video-prediction-on-moving-mnistViT (SimVP 10x)
MAE: 61.65
MSE: 19.74
SSIM: 0.9539
video-prediction-on-moving-mnistMogaNet (SimVP 10x)
MAE: 51.84
MSE: 15.67
SSIM: 0.9661
video-prediction-on-moving-mnistConvNeXt (SimVP 10x)
MAE: 55.76
MSE: 17.58
SSIM: 0.9617
video-prediction-on-moving-mnistHorNet (SimVP 10x)
MAE: 55.7
MSE: 17.4
SSIM: 0.9624
video-prediction-on-moving-mnistPoolformer (SimVP 10x)
MAE: 64.31
MSE: 20.96

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MogaNet: Multi-order Gated Aggregation Network | Papers | HyperAI