4 个月前

GPipe：使用管道并行技术高效训练巨型神经网络

Yanping Huang; Youlong Cheng; Ankur Bapna; Orhan Firat; Mia Xu Chen; Dehao Chen; HyoukJoong Lee; Jiquan Ngiam; Quoc V. Le; Yonghui Wu; Zhifeng Chen

摘要

扩大深度神经网络的容量已被证明是提高多种不同机器学习任务模型质量的有效方法。在许多情况下，当模型容量超过单个加速器的内存限制时，需要开发特殊的算法或基础设施。这些解决方案通常具有架构特定性，无法迁移到其他任务中。为了解决高效且与任务无关的模型并行需求，我们引入了GPipe，这是一种管道并行库，允许对任何可以表示为层序列的网络进行扩展。通过将不同的层子序列分配到单独的加速器上，GPipe 提供了灵活地高效扩展各种不同网络至巨大规模的能力。此外，GPipe 还利用了一种新颖的批量分割管道算法，在模型跨多个加速器分区时几乎实现了线性的加速效果。我们通过在两个具有不同网络架构的任务上训练大规模神经网络来展示 GPipe 的优势：(i) 图像分类：我们在 ImageNet-2012 数据集上训练了一个包含 5.57 亿参数的 AmoebaNet 模型，达到了 84.4% 的 top-1 准确率；(ii) 多语言神经机器翻译：我们在涵盖超过 100 种语言的语料库上训练了一个包含 60 亿参数、128 层的 Transformer 模型，并且其质量优于所有双语模型。

代码仓库

alondj/Pytorch-Gpipe

pytorch

GitHub 中提及

pikkaay/efficientnet_gpu

GitHub 中提及

qubvel/efficientnet

GitHub 中提及

xslidi/EfficientNets_ddl_apex

pytorch

GitHub 中提及

KakaoBrain/torchgpipe

pytorch

GitHub 中提及

ViswanathaReddyGajjala/EfficientNet-RetinaNet

pytorch

GitHub 中提及

northeastsquare/effficientnet

GitHub 中提及

PaddlePaddle/FleetX/tree/develop/examples/pipeline

paddle

tensorflow/lingvo

pytorch/pippy

pytorch

GitHub 中提及

shijianjian/efficientnet-pytorch-3d

pytorch

GitHub 中提及

yakhyo/EfficientNet-PyTorch

pytorch

GitHub 中提及

pytorch/tau

pytorch

GitHub 中提及

基准测试

基准	方法	指标
fine-grained-image-classification-on-birdsnap	GPIPE	Accuracy: 83.6%
fine-grained-image-classification-on-stanford	GPipe	Accuracy: 94.6%
image-classification-on-cifar-10	GPIPE + transfer learning	Percentage correct: 99
image-classification-on-cifar-100	GPIPE	Percentage correct: 91.3
image-classification-on-imagenet	GPIPE	Top 1 Accuracy: 84.4%

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供