HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

$\infty$-former: Infinite Memory Transformer

Pedro Henrique Martins Zita Marinho André F. T. Martins

$\infty$-former: Infinite Memory Transformer

Abstract

Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite memory capacity and are forced to drop old information. In this paper, we propose the $\infty$-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the $\infty$-former's attention complexity becomes independent of the context length, trading off memory length with precision. In order to control where precision is more important, $\infty$-former maintains "sticky memories" being able to model arbitrarily long contexts while keeping the computation budget fixed. Experiments on a synthetic sorting task, language modeling, and document grounded dialogue generation demonstrate the $\infty$-former's ability to retain information from long sequences.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
dialogue-generation-on-cmu-dog∞-former (Sticky memories)
F1: 9.01
Meteor: 7.55
ROUGE-1: 15.37
Rouge-L: 12.56
dialogue-generation-on-pg-19∞-former (Sticky memories + initialized GPT-2 Small)
Perplexity: 32.48
language-modelling-on-wikitext-103∞-former (initialized GPT-2 Small)
Test perplexity: 16.64
language-modelling-on-wikitext-103infty-former (Sticky memories)
Test perplexity: 24.22
language-modelling-on-wikitext-103[?]-former (SM)
Test perplexity: 16.61
language-modelling-on-wikitext-103∞-former (Sticky memories)
Test perplexity: 24.22
language-modelling-on-wikitext-103[?]-former (Sticky memories)
Test perplexity: 24.22
language-modelling-on-wikitext-103-former (SM)
Test perplexity: 16.61
language-modelling-on-wikitext-103∞-former (Sticky memories + initialized GPT-2 Small)
Test perplexity: 16.61

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
$\infty$-former: Infinite Memory Transformer | Papers | HyperAI