
摘要
本文重新审视了密集连接卷积网络(DenseNets),并揭示了其在主流残差网络(ResNet)架构中被低估的有效性。我们认为,DenseNets 的潜力之所以被忽视,是因为未经改进的训练方法和传统的设计元素未能充分展示其能力。我们的初步研究表明,通过拼接实现的密集连接非常强大,表明 DenseNets 可以重新焕发活力,与现代架构竞争。我们系统地优化了次优组件——包括架构调整、模块重新设计以及改进的训练方案,旨在拓宽 DenseNets 并提高内存效率,同时保留拼接捷径。最终,我们的模型采用了简单的架构元素,在性能上超过了 Swin Transformer、ConvNeXt 和 DeiT-III 等残差学习谱系中的关键架构。此外,我们的模型在 ImageNet-1K 数据集上表现出接近最先进水平的性能,并且在最近的模型和下游任务如 ADE20k 语义分割和 COCO 目标检测/实例分割中也具有竞争力。最后,我们提供了实证分析,揭示了拼接捷径相对于加性捷径的优势,并引导人们重新关注 DenseNet 风格的设计。我们的代码已发布在 https://github.com/naver-ai/rdnet。
代码仓库
naver-ai/rdnet
官方
pytorch
GitHub 中提及
huggingface/pytorch-image-models
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| fine-grained-image-classification-on-stanford | RDNet-T (224 res, IN-1K pretrained) | Accuracy: 93.9% FLOPS: 5.0G PARAMS: 24M |
| fine-grained-image-classification-on-stanford | RDNet-L (224 res, IN-1K pretrained) | Accuracy: 94.2% FLOPS: 34.7G PARAMS: 186M |
| fine-grained-image-classification-on-stanford | RDNet-S (224 res, IN-1K pretrained) | Accuracy: 94.2% FLOPS: 8.7G PARAMS: 50M |
| fine-grained-image-classification-on-stanford | RDNet-B (224 res, IN-1K pretrained) | Accuracy: 94.1% FLOPS: 15.4G PARAMS: 87M |
| image-classification-on-cifar-10 | RDNet-L (224 res, IN-1K pretrained) | Percentage correct: 99.31 |
| image-classification-on-cifar-10 | RDNet-T (224 res, IN-1K pretrained) | Percentage correct: 98.88 |
| image-classification-on-cifar-10 | RDNet-B (224 res, IN-1K pretrained) | Percentage correct: 99.31 |
| image-classification-on-imagenet | RDNet-S | GFLOPs: 8.7 Number of params: 50M Top 1 Accuracy: 83.7% |
| image-classification-on-imagenet | RDNet-T | GFLOPs: 5.0 Number of params: 24M Top 1 Accuracy: 82.8% |
| image-classification-on-imagenet | RDNet-L | GFLOPs: 34.7 Number of params: 186M Top 1 Accuracy: 84.8% |
| image-classification-on-imagenet | RDNet-L (384 res) | GFLOPs: 34.7 Number of params: 186M Top 1 Accuracy: 85.8% |
| image-classification-on-imagenet | RDNet-B | GFLOPs: 15.4 Number of params: 87M Top 1 Accuracy: 84.4% |
| image-classification-on-inaturalist-2018 | RDNet-T (224 res, IN-1K pretrained) | Number of params: 24M Top-1 Accuracy: 77.0 |
| image-classification-on-inaturalist-2018 | RDNet-L (224 res, IN-1K pretrained) | Number of params: 186M Top-1 Accuracy: 81.8% |
| image-classification-on-inaturalist-2018 | RDNet-S (224 res, IN-1K pretrained) | Number of params: 50M Top-1 Accuracy: 79.1 |
| image-classification-on-inaturalist-2018 | RDNet-B (224 res, IN-1K pretrained) | Number of params: 87M Top-1 Accuracy: 80.5 |
| image-classification-on-inaturalist-2019 | RDNet-T (224 res, IN-1K pretrained) | Number of params: 24M Top-1 Accuracy: 81.2 |
| image-classification-on-inaturalist-2019 | RDNet-S (224 res, IN-1K pretrained) | Number of params: 50M Top-1 Accuracy: 82.9 |
| image-classification-on-inaturalist-2019 | RDNet-L (224 res, IN-1K pretrained) | Number of params: 186M Top-1 Accuracy: 83.7 |
| image-classification-on-inaturalist-2019 | RDNet-B (224 res, IN-1K pretrained) | Number of params: 87M Top-1 Accuracy: 83.5 |