Command Palette
Search for a command to run...
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences
Andis Draguns; Emīls Ozoliņš; Agris Šostaks; Matīss Apinis; Kārlis Freivalds

Abstract
Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences. The recently introduced neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from the Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses the Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while being efficient in the number of parameters. We show how to combine the improved Shuffle-Exchange network with convolutional layers, establishing it as a useful building block in long sequence processing applications.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| language-modelling-on-lambada | Residual Shuffle-Exchange network | Accuracy: 54.34 |
| music-transcription-on-musicnet | Residual Shuffle-Exchange network | APS: 78.02 Number of params: 3.06M |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.