HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Lessons on Parameter Sharing across Layers in Transformers

Sho Takase Shun Kiyono

Lessons on Parameter Sharing across Layers in Transformers

Abstract

We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.

Code Repositories

takase/share_layer_params
Official
pytorch
Mentioned in GitHub
jaketae/param-share-transformer
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
machine-translation-on-wmt2014-english-germanTransformer Cycle (Rev)
BLEU score: 35.14
SacreBLEU: 33.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Lessons on Parameter Sharing across Layers in Transformers | Papers | HyperAI