5 months ago

Relaxed Attention for Transformer Models

Lohrenz Timo ; Möller Björn ; Li Zhengyang ; Fingscheidt Tim

Abstract

The powerful modeling capabilities of all-attention-based transformerarchitectures often cause overfitting and - for natural language processingtasks - lead to an implicitly learned internal language model in theautoregressive transformer decoder complicating the integration of externallanguage models. In this paper, we explore relaxed attention, a simple andeasy-to-implement smoothing of the attention weights, yielding a two-foldimprovement to the general transformer architecture: First, relaxed attentionprovides regularization when applied to the self-attention layers in theencoder. Second, we show that it naturally supports the integration of anexternal language model as it suppresses the implicitly learned internallanguage model by relaxing the cross attention in the decoder. We demonstratethe benefit of relaxed attention across several tasks with clear improvement incombination with recent benchmark approaches. Specifically, we exceed theformer state-of-the-art performance of 26.90% word error rate on the largestpublic lip-reading LRS3 benchmark with a word error rate of 26.31%, as well aswe achieve a top-performing BLEU score of 37.67 on the IWSLT14(DE$\rightarrow$EN) machine translation task without external language modelsand virtually no additional model parameters. Code and models will be madepublicly available.

Code Repositories

Oguzhanercan/Vision-Transformers

Benchmarks

Benchmark	Methodology	Metrics
lipreading-on-lrs3-ted	AV-HuBERT Large + Relaxed Attention + LM	Word Error Rate (WER): 25.51
machine-translation-on-iwslt2014-german	Cutoff + Relaxed Attention + LM	BLEU score: 37.96 Number of Params: 24.1M

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette