
摘要
近期在图像分类领域的研究已展示了多种提升卷积神经网络(CNN)性能的技术手段。然而,将现有技术进行有效整合以构建实用化模型的尝试仍较为罕见。在本研究中,我们通过大量实验验证了:通过精心组合各类技术并将其应用于基础CNN模型(如ResNet和MobileNet),可在显著提升模型准确率与鲁棒性的同时,最大限度地减少吞吐量的损失。所提出的集成型ResNet-50模型在ILSVRC2012验证集上,Top-1准确率由76.3%提升至82.78%,类别间错误率(mCE)从76.0%降至48.9%,误检率(mFR)由57.7%下降至32.3%。在此性能提升下,推理吞吐量仅从536帧/秒降至312帧/秒。为验证迁移学习性能的提升效果,我们在多个公开数据集上测试了细粒度分类与图像检索任务,结果表明,骨干网络性能的增强显著提升了迁移学习的表现。本方法在CVPR 2019 iFood竞赛的细粒度视觉识别赛道中荣获第一名。相关源代码与训练好的模型已开源,可访问:https://github.com/clovaai/assembled-cnn
代码仓库
clovaai/assembled-cnn
官方
tf
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| fine-grained-image-classification-on-fgvc | Assemble-ResNet-FGVC-50 | Accuracy: 92.4 |
| fine-grained-image-classification-on-food-101 | Assemble-ResNet-FGVC-50 | Accuracy: 92.5 Top 1 Accuracy: 92.47 |
| fine-grained-image-classification-on-oxford | Assemble-ResNet | Accuracy: 98.9% |
| fine-grained-image-classification-on-oxford-2 | Assemble-ResNet-FGVC-50 | Accuracy: 94.3% Top-1 Error Rate: 5.7 |
| fine-grained-image-classification-on-sop | Assemble-ResNet-FGVC-50 | Recall@1: 85.9 |
| fine-grained-image-classification-on-stanford | Assemble-ResNet-FGVC-50 | Accuracy: 94.4% |
| image-classification-on-imagenet | Assemble-ResNet152 | GFLOPs: 15.8 Top 1 Accuracy: 84.2% |
| image-classification-on-imagenet-real | Assemble ResNet-50 | Accuracy: 87.82% |
| image-classification-on-imagenet-real | Assemble-ResNet152 | Accuracy: 88.65% |