HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Mega: Moving Average Equipped Gated Attention

Xuezhe Ma Chunting Zhou Xiang Kong Junxian He Liangke Gui Graham Neubig Jonathan May Luke Zettlemoyer

Mega: Moving Average Equipped Gated Attention

Abstract

The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences. In this paper, we introduce Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism. We further propose a variant of Mega that offers linear time and space complexity yet yields only minimal quality loss, by efficiently splitting the whole sequence into multiple chunks with fixed length. Extensive experiments on a wide range of sequence modeling benchmarks, including the Long Range Arena, neural machine translation, auto-regressive language modeling, and image and speech classification, show that Mega achieves significant improvements over other sequence models, including variants of Transformers and recent state space models.

Code Repositories

facebookresearch/mega
Official
pytorch
Mentioned in GitHub
ethanbar11/ssm_2d
pytorch
Mentioned in GitHub
lucidrains/gated-state-spaces-pytorch
pytorch
Mentioned in GitHub
huggingface/transformers
pytorch
Mentioned in GitHub
linghao-jin/canmt-challenges
pytorch
Mentioned in GitHub
ZIZUN/MAFiD
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-imagenetMega
Number of params: 90M
Top 1 Accuracy: 82.4%
language-modelling-on-wikitext-103Mega
Number of params: 252M
Test perplexity: 18.07
machine-translation-on-wmt2014-english-germanMega
BLEU score: 29.01
Number of Params: 67M
SacreBLEU: 27.96
machine-translation-on-wmt2014-german-englishMega
BLEU score: 33.12

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Mega: Moving Average Equipped Gated Attention | Papers | HyperAI