HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Resurrecting Recurrent Neural Networks for Long Sequences

Antonio Orvieto; Samuel L Smith; Albert Gu; Anushan Fernando; Caglar Gulcehre; Razvan Pascanu; Soham De

Resurrecting Recurrent Neural Networks for Long Sequences

Abstract

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important differences that make it unclear where their performance boost over RNNs comes from. In this paper, we show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while also matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring proper normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, while also introducing an RNN block called the Linear Recurrent Unit that matches both their performance on the Long Range Arena benchmark and their computational efficiency.

Code Repositories

bojone/rnn
Mentioned in GitHub
DecodEPFL/SSM
jax
Mentioned in GitHub
Gothos/LRU-pytorch
pytorch
Mentioned in GitHub
forgi86/lru-reduction
jax
Mentioned in GitHub
esraaelelimy/LRU
jax
Mentioned in GitHub
sustcsonglin/pytorch_linear_rnn
pytorch
Mentioned in GitHub
nicolaszucchet/minimal-lru
jax
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sequential-image-classification-on-sequential-1LRU
Unpermuted Accuracy: 89.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Resurrecting Recurrent Neural Networks for Long Sequences | Papers | HyperAI