HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning

Guangxiang Zhao Xu Sun Jingjing Xu Zhiyuan Zhang Liangchen Luo

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning

Abstract

In sequence to sequence learning, the self-attention mechanism proves to be highly effective, and achieves significant improvements in many tasks. However, the self-attention mechanism is not without its own flaws. Although self-attention can model extremely long dependencies, the attention in deep layers tends to overconcentrate on a single token, leading to insufficient use of local information and difficultly in representing long sequences. In this work, we explore parallel multi-scale representation learning on sequence data, striving to capture both long-range and short-range language structures. To this end, we propose the Parallel MUlti-Scale attEntion (MUSE) and MUSE-simple. MUSE-simple contains the basic idea of parallel multi-scale sequence representation learning, and it encodes the sequence in parallel, in terms of different scales with the help from self-attention, and pointwise transformation. MUSE builds on MUSE-simple and explores combining convolution and self-attention for learning sequence representations from more different scales. We focus on machine translation and the proposed approach achieves substantial performance improvements over Transformer, especially on long sequences. More importantly, we find that although conceptually simple, its success in practice requires intricate considerations, and the multi-scale attention must build on unified semantic space. Under common setting, the proposed model achieves substantial performance and outperforms all previous models on three main machine translation tasks. In addition, MUSE has potential for accelerating inference due to its parallelism. Code will be available at https://github.com/lancopku/MUSE

Code Repositories

lancopku/Prime
pytorch
Mentioned in GitHub
lancopku/MUSE
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
machine-translation-on-iwslt2014-germanMUSE(Parallel Multi-scale Attention)
BLEU score: 36.3
machine-translation-on-wmt2014-english-frenchMUSE(Paralllel Multi-scale Attention)
BLEU score: 43.5
Hardware Burden:
Operations per network pass:
machine-translation-on-wmt2014-english-germanMUSE(Parallel Multi-scale Attention)
BLEU score: 29.9
Hardware Burden:
Operations per network pass:

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning | Papers | HyperAI