3 个月前

LeViT:一种面向快速推理的卷积神经网络风格视觉Transformer

LeViT:一种面向快速推理的卷积神经网络风格视觉Transformer

摘要

我们设计了一类图像分类架构,旨在高速运行环境下优化准确率与效率之间的权衡。本工作基于近期关于基于注意力机制架构的研究成果,这类架构在高度并行的硬件平台上表现出色。我们重新审视了卷积神经网络领域中大量文献所揭示的原理,并将其应用于Transformer模型,特别是引入了分辨率逐步降低的激活图(activation maps)机制。此外,我们提出了一种新的“注意力偏置”(attention bias)方法,用于在视觉Transformer中更有效地融合位置信息。基于上述创新,我们提出了LeVIT:一种面向快速推理的图像分类混合神经网络。为了全面反映多样化的应用场景,我们在不同硬件平台上评估了多种效率指标。大量实验结果验证了我们的技术选择的有效性,表明这些方法适用于大多数主流架构。总体而言,LeVIT在速度与准确率的权衡方面显著优于现有的卷积神经网络(convnets)和视觉Transformer。例如,在达到ImageNet数据集80%的Top-1准确率时,LeVIT在CPU上的推理速度比EfficientNet快5倍。相关代码已开源,地址为:https://github.com/facebookresearch/LeViT

基准测试

基准方法指标
image-classification-on-cifar-10LeViT-128S
Percentage correct: 97.5
image-classification-on-cifar-10LeViT-128
Percentage correct: 97.6
image-classification-on-cifar-10LeViT-384
Percentage correct: 98
image-classification-on-cifar-10LeViT-192
Percentage correct: 98.2
image-classification-on-cifar-10LeViT-256
Percentage correct: 98.1
image-classification-on-flowers-102LeViT-128S
Accuracy: 96.8
image-classification-on-flowers-102LeViT-192
Accuracy: 97.8
image-classification-on-flowers-102LeViT-256
Accuracy: 97.7
image-classification-on-flowers-102LeViT-384
Accuracy: 98.3
image-classification-on-imagenetLeViT-384
GFLOPs: 2.334
Number of params: 39.4M
Top 1 Accuracy: 82.5%
image-classification-on-imagenetLeViT-256
GFLOPs: 1.066
Number of params: 17.8M
Top 1 Accuracy: 81.6%
image-classification-on-imagenetLeViT-128S
GFLOPs: 0.288
Number of params: 4.7M
Top 1 Accuracy: 75.7%
image-classification-on-imagenetLeViT-128
GFLOPs: 0.376
Number of params: 8.8M
Top 1 Accuracy: 79.6%
image-classification-on-imagenetLeViT-192
GFLOPs: 0.624
Number of params: 10.4M
Top 1 Accuracy: 80%
image-classification-on-imagenet-realLeViT-384
Accuracy: 87.5%
image-classification-on-imagenet-realLeViT-256
Accuracy: 86.9%
image-classification-on-imagenet-realLeViT-128
Accuracy: 85.6%
image-classification-on-imagenet-realLeViT-128S
Accuracy: 82.6%
image-classification-on-imagenet-realLeViT-192
Accuracy: 85.8%
image-classification-on-imagenet-v2LeViT-256
Top 1 Accuracy: 69.9
image-classification-on-imagenet-v2LeViT-192
Top 1 Accuracy: 68.7
image-classification-on-imagenet-v2LeViT-384
Top 1 Accuracy: 71.4
image-classification-on-imagenet-v2LeViT-128S
Top 1 Accuracy: 63.9
image-classification-on-imagenet-v2LeViT-128
Top 1 Accuracy: 67.5
image-classification-on-inaturalist-2018LeViT-384
Top-1 Accuracy: 66.9%
image-classification-on-inaturalist-2018LeViT-128S
Top-1 Accuracy: 55.2%
image-classification-on-inaturalist-2018LeViT-256
Top-1 Accuracy: 66.2%
image-classification-on-inaturalist-2018LeViT-192
Top-1 Accuracy: 60.4%
image-classification-on-inaturalist-2018LeViT-128
Top-1 Accuracy: 54%
image-classification-on-inaturalist-2019LeViT-192
Top-1 Accuracy: 70.8
image-classification-on-inaturalist-2019LeViT-256
Top-1 Accuracy: 72.3
image-classification-on-inaturalist-2019LeViT-128
Top-1 Accuracy: 68.4
image-classification-on-inaturalist-2019LeViT-384
Top-1 Accuracy: 74.3
image-classification-on-inaturalist-2019LeViT-128S
Top-1 Accuracy: 66.5
image-classification-on-stanford-carsLeViT-128S
Accuracy: 88.4
image-classification-on-stanford-carsLeViT-256
Accuracy: 88.2
image-classification-on-stanford-carsLeViT-384
Accuracy: 89.3
image-classification-on-stanford-carsLeViT-128
Accuracy: 88.6
image-classification-on-stanford-carsLeViT-192
Accuracy: 89.8

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
LeViT:一种面向快速推理的卷积神经网络风格视觉Transformer | 论文 | HyperAI超神经