3 months ago

MogaNet: Multi-order Gated Aggregation Network

Siyuan Li Zedong Wang Zicheng Liu Cheng Tan Haitao Lin Di Wu Zhiyuan Chen Jiangbin Zheng Stan Z. Li

Abstract

By contextualizing the kernel as global as possible, Modern ConvNets have shown great potential in computer vision tasks. However, recent progress on multi-order game-theoretic interaction within deep neural networks (DNNs) reveals the representation bottleneck of modern ConvNets, where the expressive interactions have not been effectively encoded with the increased kernel size. To tackle this challenge, we propose a new family of modern ConvNets, dubbed MogaNet, for discriminative visual representation learning in pure ConvNet-based models with favorable complexity-performance trade-offs. MogaNet encapsulates conceptually simple yet effective convolutions and gated aggregation into a compact module, where discriminative features are efficiently gathered and contextualized adaptively. MogaNet exhibits great scalability, impressive efficiency of parameters, and competitive performance compared to state-of-the-art ViTs and ConvNets on ImageNet and various downstream vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D&3D human pose estimation, and video prediction. Notably, MogaNet hits 80.0% and 87.8% accuracy with 5.2M and 181M parameters on ImageNet-1K, outperforming ParC-Net and ConvNeXt-L, while saving 59% FLOPs and 17M parameters, respectively. The source code is available at https://github.com/Westlake-AI/MogaNet.

Code Repositories

shanglianlm0525/CvPytorch

pytorch

Westlake-AI/openmixup

Official

pytorch

Mentioned in GitHub

leondgarse/keras_cv_attention_models/tree/main/keras_cv_attention_models/moganet

https://gitlab.com/birder/birder

pytorch

chengtan9907/simvpv2

pytorch

Mentioned in GitHub

chengtan9907/OpenSTL

Official

pytorch

Mentioned in GitHub

Westlake-AI/MogaNet

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
image-classification-on-imagenet	MogaNet-XT (256res)	GFLOPs: 1.04 Number of params: 3M Top 1 Accuracy: 77.2%
image-classification-on-imagenet	MogaNet-L	GFLOPs: 15.9 Number of params: 83M Top 1 Accuracy: 84.7%
image-classification-on-imagenet	MogaNet-S	GFLOPs: 5 Number of params: 25M Top 1 Accuracy: 83.4%
image-classification-on-imagenet	MogaNet-T (256res)	GFLOPs: 1.44 Number of params: 5.2M Top 1 Accuracy: 80%
image-classification-on-imagenet	MogaNet-B	GFLOPs: 9.9 Number of params: 44M Top 1 Accuracy: 84.3%
image-classification-on-imagenet	MogaNet-XL (384res)	GFLOPs: 102 Number of params: 181M Top 1 Accuracy: 87.8%
instance-segmentation-on-coco	MogaNet-B (Cascade Mask R-CNN)	mask AP: 46
instance-segmentation-on-coco	MogaNet-T	mask AP: 35.8
instance-segmentation-on-coco	MogaNet-B (Mask R-CNN 1x)	mask AP: 43.2
instance-segmentation-on-coco	MogaNet-L (Cascade Mask R-CNN)	mask AP: 46.1
instance-segmentation-on-coco	MogaNet-S (Mask R-CNN 1x)	mask AP: 42.2
instance-segmentation-on-coco	MogaNet-S (Cascade Mask R-CNN)	mask AP: 45.1
instance-segmentation-on-coco	MogaNet-L (Mask R-CNN 1x)	mask AP: 44.1
instance-segmentation-on-coco	MogaNet-XT	mask AP: 37.6
instance-segmentation-on-coco	MogaNet-XL (Cascade Mask R-CNN)	mask AP: 48.8
instance-segmentation-on-coco	MogaNet-T (Mask R-CNN 1x)	mask AP: 39.1
instance-segmentation-on-coco-val2017	MogaNet-S (256x192)	AP50: 90.7 AP75: 82.8
object-detection-on-coco-2017-val	MogaNet-XL (Cascade Mask R-CNN)	AP: 56.2
object-detection-on-coco-2017-val	MogaNet-S (RetinaNet 1x)	AP: 45.8
object-detection-on-coco-2017-val	MogaNet-B (Cascade Mask R-CNN)	AP: 52.6
object-detection-on-coco-2017-val	MogaNet-L (Mask R-CNN 1x)	AP: 49.4
object-detection-on-coco-2017-val	MogaNet-S (Mask R-CNN 1x)	AP: 46.7
object-detection-on-coco-2017-val	MogaNet-B (RetinaNet 1x)	AP: 47.7
object-detection-on-coco-2017-val	MogaNet-L (Cascade Mask R-CNN)	AP: 53.3
object-detection-on-coco-2017-val	MogaNet-XT (RetinaNet 1x)	AP: 39.7
object-detection-on-coco-2017-val	MogaNet-L (RetinaNet 1x)	AP: 48.7
object-detection-on-coco-2017-val	MogaNet-XT (Mask R-CNN 1x)	AP: 40.7
object-detection-on-coco-2017-val	MogaNet-T (Mask R-CNN 1x)	AP: 42.6
object-detection-on-coco-2017-val	MogaNet-B (Mask R-CNN 1x)	AP: 47.9
object-detection-on-coco-2017-val	MogaNet-T (RetinaNet 1x)	AP: 41.4
object-detection-on-coco-2017-val	MogaNet-S (Cascade Mask R-CNN)	AP: 51.6
pose-estimation-on-coco-val2017	MogaNet-S (256x192)	AP: 74.9 AR: 80.1
pose-estimation-on-coco-val2017	MogaNet-T (256x192)	AP: 73.2 AP50: 90.1 AP75: 81 AR: 78.8
pose-estimation-on-coco-val2017	MogaNet-B (384x288)	AP: 77.3 AP50: 91.4 AP75: 84 AR: 82.2
pose-estimation-on-coco-val2017	MogaNet-S (384x288)	AP: 76.4 AP50: 91 AP75: 83.3 AR: 81.4
semantic-segmentation-on-ade20k	MogaNet-B (UperNet)	GFLOPs (512 x 512): 1050 Validation mIoU: 50.1
semantic-segmentation-on-ade20k	MogaNet-L (UperNet)	GFLOPs (512 x 512): 1176 Validation mIoU: 50.9
semantic-segmentation-on-ade20k	MogaNet-S (Semantic FPN)	GFLOPs (512 x 512): 189 Validation mIoU: 47.7
semantic-segmentation-on-ade20k	MogaNet-S (UperNet)	GFLOPs (512 x 512): 946 Validation mIoU: 49.2
semantic-segmentation-on-ade20k	MogaNet-XL (UperNet)	Validation mIoU: 54
video-prediction-on-moving-mnist	VAN (SimVP 10x)	MAE: 53.57 MSE: 16.21 SSIM: 0.9646
video-prediction-on-moving-mnist	Swin (SimVP 10x)	MAE: 59.84 MSE: 19.11
video-prediction-on-moving-mnist	ConvMixer (SimVP 10x)	MAE: 67.37 MSE: 22.3
video-prediction-on-moving-mnist	Uniformer (SimVP 10x)	MAE: 57.52 MSE: 18.01
video-prediction-on-moving-mnist	MLP-Mixer (SimVP 10x)	MAE: 59.86 MSE: 18.85
video-prediction-on-moving-mnist	ViT (SimVP 10x)	MAE: 61.65 MSE: 19.74 SSIM: 0.9539
video-prediction-on-moving-mnist	MogaNet (SimVP 10x)	MAE: 51.84 MSE: 15.67 SSIM: 0.9661
video-prediction-on-moving-mnist	ConvNeXt (SimVP 10x)	MAE: 55.76 MSE: 17.58 SSIM: 0.9617
video-prediction-on-moving-mnist	HorNet (SimVP 10x)	MAE: 55.7 MSE: 17.4 SSIM: 0.9624
video-prediction-on-moving-mnist	Poolformer (SimVP 10x)	MAE: 64.31 MSE: 20.96

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

MogaNet: Multi-order Gated Aggregation Network

Siyuan Li Zedong Wang Zicheng Liu Cheng Tan Haitao Lin Di Wu Zhiyuan Chen Jiangbin Zheng Stan Z. Li

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters