
摘要
近年来,大核卷积神经网络(Large-kernel ConvNets)受到广泛关注,但仍有两个尚未解决且至关重要的问题亟待深入研究:其一,现有大核ConvNet的架构设计主要沿袭传统ConvNet或Transformer的设计范式,而针对大核ConvNet本身的架构设计尚缺乏系统性探索;其二,尽管Transformer已在多模态任务中占据主导地位,但ConvNet是否同样具备在视觉领域之外的广泛感知能力,仍需进一步验证。本文从两个方面做出贡献:其一,我们提出了设计大核ConvNet的四项架构准则,其核心思想在于充分挖掘大核与小核的本质差异——即“以宽代深”,无需通过加深网络即可实现大范围感受野。遵循这些准则,所提出的大型核ConvNet在图像识别任务中展现出领先性能:在ImageNet上达到88.0%的准确率,在ADE20K上实现55.6%的mIoU,在COCO目标检测任务中达到56.4%的box AP,显著优于近期主流先进模型,兼具更高的性能与更快的推理速度。其二,我们发现大核是解锁ConvNet在非擅长领域卓越表现的关键。通过引入特定模态的预处理方法,所提出模型在时间序列预测与音频识别任务上均达到当前最优水平,且无需对网络架构进行针对特定模态的定制化设计。所有代码与模型均已公开发布于GitHub与Hugging Face平台,供学术界与工业界自由使用。
代码仓库
ailab-cvc/unireplknet
官方
pytorch
GitHub 中提及
Westlake-AI/openmixup
pytorch
GitHub 中提及
chenller/mmseg-extension
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| image-classification-on-imagenet | UniRepLKNet-B++ | Top 1 Accuracy: 87.4% |
| image-classification-on-imagenet | UniRepLKNet-S++ | Top 1 Accuracy: 86.4% |
| image-classification-on-imagenet | UniRepLKNet-T | Top 1 Accuracy: 83.2% |
| image-classification-on-imagenet | UniRepLKNet-XL++ | Top 1 Accuracy: 88% |
| image-classification-on-imagenet | UniRepLKNet-L++ | Top 1 Accuracy: 87.9% |
| image-classification-on-imagenet | UniRepLKNet-S | Top 1 Accuracy: 83.9% |
| image-classification-on-imagenet | UniRepLKNet-N | Top 1 Accuracy: 81.6% |
| image-classification-on-imagenet | UniRepLKNet-P | Top 1 Accuracy: 80.2% |
| image-classification-on-imagenet | UniRepLKNet-A | Top 1 Accuracy: 77% |
| image-classification-on-imagenet | UniRepLKNet-F | Top 1 Accuracy: 78.6% |
| object-detection-on-coco-2017 | UniRepLKNet-S++ | mAP: 54.3 |
| object-detection-on-coco-2017 | UniRepLKNet-T | mAP: 51.7 |
| object-detection-on-coco-2017 | UniRepLKNet-B++ | mAP: 54.8 |
| object-detection-on-coco-2017 | UniRepLKNet-S | mAP: 53 |
| object-detection-on-coco-2017 | UniRepLKNet-XL++ | mAP: 56.4 |
| object-detection-on-coco-2017 | UniRepLKNet-L++ | mAP: 55.8 |
| semantic-segmentation-on-ade20k | UniRepLKNet-T | Validation mIoU: 49.1 |
| semantic-segmentation-on-ade20k | UniRepLKNet-L++ | Validation mIoU: 55 |
| semantic-segmentation-on-ade20k | UniRepLKNet-B++ | Validation mIoU: 53.9 |
| semantic-segmentation-on-ade20k | UniRepLKNet-S++ | Validation mIoU: 52.7 |
| semantic-segmentation-on-ade20k | UniRepLKNet-XL | Validation mIoU: 55.6 |
| semantic-segmentation-on-ade20k | UniRepLKNet-S | Validation mIoU: 51 |