HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Reducing Transformer Depth on Demand with Structured Dropout

Angela Fan Edouard Grave Armand Joulin

Reducing Transformer Depth on Demand with Structured Dropout

Abstract

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time. In particular, we show that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. We demonstrate the effectiveness of our approach by improving the state of the art on machine translation, language modeling, summarization, question answering, and language understanding benchmarks. Moreover, we show that our approach leads to small BERT-like models of higher quality compared to training from scratch or using distillation.

Code Repositories

prajjwal1/adaptive_transformer
pytorch
Mentioned in GitHub
thunlp-mt/promptgating4mctg
pytorch
Mentioned in GitHub
prajjwal1/fluence
pytorch
Mentioned in GitHub
c00k1ez/plain-transformers
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
open-domain-question-answering-on-eli5Transformer Multitask + LayerDrop
Rouge-1: 29.4
Rouge-2: 5.5
Rouge-L: 23.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Reducing Transformer Depth on Demand with Structured Dropout | Papers | HyperAI