HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Relaxed Attention for Transformer Models

Lohrenz Timo ; Möller Björn ; Li Zhengyang ; Fingscheidt Tim

Relaxed Attention for Transformer Models

Abstract

The powerful modeling capabilities of all-attention-based transformerarchitectures often cause overfitting and - for natural language processingtasks - lead to an implicitly learned internal language model in theautoregressive transformer decoder complicating the integration of externallanguage models. In this paper, we explore relaxed attention, a simple andeasy-to-implement smoothing of the attention weights, yielding a two-foldimprovement to the general transformer architecture: First, relaxed attentionprovides regularization when applied to the self-attention layers in theencoder. Second, we show that it naturally supports the integration of anexternal language model as it suppresses the implicitly learned internallanguage model by relaxing the cross attention in the decoder. We demonstratethe benefit of relaxed attention across several tasks with clear improvement incombination with recent benchmark approaches. Specifically, we exceed theformer state-of-the-art performance of 26.90% word error rate on the largestpublic lip-reading LRS3 benchmark with a word error rate of 26.31%, as well aswe achieve a top-performing BLEU score of 37.67 on the IWSLT14(DE$\rightarrow$EN) machine translation task without external language modelsand virtually no additional model parameters. Code and models will be madepublicly available.

Benchmarks

BenchmarkMethodologyMetrics
lipreading-on-lrs3-tedAV-HuBERT Large + Relaxed Attention + LM
Word Error Rate (WER): 25.51
machine-translation-on-iwslt2014-germanCutoff + Relaxed Attention + LM
BLEU score: 37.96
Number of Params: 24.1M

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Relaxed Attention for Transformer Models | Papers | HyperAI