HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

A ConvNet for the 2020s

Zhuang Liu Hanzi Mao Chao-Yuan Wu Christoph Feichtenhofer Trevor Darrell Saining Xie

A ConvNet for the 2020s

Abstract

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

Code Repositories

k-h-ismail/convnext-dcls
pytorch
Mentioned in GitHub
dongkyuk/ConvNext-tensorflow
tf
Mentioned in GitHub
hmichaeli/alias_free_convnets
pytorch
Mentioned in GitHub
mzeromiko/vmamba
pytorch
Mentioned in GitHub
sayakpaul/ConvNeXt-TF
tf
Mentioned in GitHub
james77777778/keras-image-models
pytorch
Mentioned in GitHub
frgfm/Holocron
pytorch
Mentioned in GitHub
rwightman/pytorch-image-models
pytorch
Mentioned in GitHub
Westlake-AI/openmixup
pytorch
Mentioned in GitHub
Owais-Ansari/Unet3plus
pytorch
Mentioned in GitHub
hanfried/hanfried-bookmarks
pytorch
Mentioned in GitHub
duyhominhnguyen/LVM-Med
pytorch
Mentioned in GitHub
jmnolte/hccnet
pytorch
Mentioned in GitHub
AlassaneSakande/A-ConvNet-of-2020s
pytorch
Mentioned in GitHub
IMvision12/keras-vision-models
pytorch
Mentioned in GitHub
waterdisappear/nudt4mstar
pytorch
Mentioned in GitHub
lucidrains/denoising-diffusion-pytorch
pytorch
Mentioned in GitHub
yaya-yns/tart
pytorch
Mentioned in GitHub
avocardio/resnet_vs_convnext
tf
Mentioned in GitHub
facebookresearch/ConvNeXt
Official
pytorch
Mentioned in GitHub
mit-han-lab/litepose
pytorch
Mentioned in GitHub
tuanio/nextformer
pytorch
Mentioned in GitHub
facebookresearch/ppuda
pytorch
Mentioned in GitHub
Raghvender1205/ConvNeXt
pytorch
Mentioned in GitHub
DarshanDeshpande/jax-models
jax
Mentioned in GitHub
flytocc/ConvNeXt-paddle
paddle
Mentioned in GitHub
sithu31296/semantic-segmentation
pytorch
Mentioned in GitHub
0jason000/convnext
mindspore
Mentioned in GitHub
zibbini/convnext-v2_tensorflow
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
classification-on-indlConvNext
Average Recall: 93.47%
domain-generalization-on-imagenet-aConvNeXt-XL (Im21k, 384)
Top-1 accuracy %: 69.3
domain-generalization-on-imagenet-cConvNeXt-XL (Im21k) (augmentation overlap with ImageNet-C)
Number of params: 350M
mean Corruption Error (mCE): 38.8
domain-generalization-on-imagenet-rConvNeXt-XL (Im21k, 384)
Top-1 Error Rate: 31.8
domain-generalization-on-imagenet-sketchConvNeXt-XL (Im21k, 384)
Top-1 accuracy: 55.0
domain-generalization-on-vizwizConvNeXt-B
Accuracy - All Images: 53.5
Accuracy - Clean Images: 56
Accuracy - Corrupted Images: 46.9
image-classification-on-imagenetConvNeXt-XL (ImageNet-22k)
GFLOPs: 179
Number of params: 350M
Top 1 Accuracy: 87.8%
image-classification-on-imagenetAdlik-ViT-SG+Swin_large+Convnext_xlarge(384)
Number of params: 1827M
Top 1 Accuracy: 88.36%
image-classification-on-imagenetConvNeXt-L (384 res)
GFLOPs: 101
Number of params: 198M
Top 1 Accuracy: 85.5%
image-classification-on-imagenetConvNeXt-T
GFLOPs: 4.5
Number of params: 29M
Top 1 Accuracy: 82.1%
object-detection-on-coco-oConvNeXt-XL (Cascade Mask R-CNN)
Average mAP: 37.5
Effective Robustness: 12.68
semantic-segmentation-on-ade20kConvNeXt-S
GFLOPs (512 x 512): 1027
Params (M): 82
Validation mIoU: 49.6
semantic-segmentation-on-ade20kConvNeXt-B++
GFLOPs (512 x 512): 1828
Params (M): 122
Validation mIoU: 53.1
semantic-segmentation-on-ade20kConvNeXt-B
GFLOPs (512 x 512): 1170
Params (M): 122
Validation mIoU: 49.9
semantic-segmentation-on-ade20kConvNeXt-T
GFLOPs (512 x 512): 939
Params (M): 60
Validation mIoU: 46.7
semantic-segmentation-on-ade20kConvNeXt-L++
GFLOPs (512 x 512): 2458
Params (M): 235
Validation mIoU: 53.7
semantic-segmentation-on-ade20kConvNeXt-XL++
GFLOPs (512 x 512): 3335
Params (M): 391
Validation mIoU: 54
semantic-segmentation-on-imagenet-sConvNext-Tiny (P4, 224x224, SUP)
mIoU (test): 48.8
mIoU (val): 48.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A ConvNet for the 2020s | Papers | HyperAI