HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

R-Transformer: Recurrent Neural Network Enhanced Transformer

Zhiwei Wang; Yao Ma; Zitao Liu; Jiliang Tang

R-Transformer: Recurrent Neural Network Enhanced Transformer

Abstract

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation procedure. Therefore, many non-recurrent sequence models that are built on convolution and attention operations have been proposed recently. Notably, models with multi-head attention such as Transformer have demonstrated extreme effectiveness in capturing long-term dependencies in a variety of sequence modeling tasks. Despite their success, however, these models lack necessary components to model local structures in sequences and heavily rely on position embeddings that have limited effects and require a considerable amount of design efforts. In this paper, we propose the R-Transformer which enjoys the advantages of both RNNs and the multi-head attention mechanism while avoids their respective drawbacks. The proposed model can effectively capture both local structures and global long-term dependencies in sequences without any use of position embeddings. We evaluate R-Transformer through extensive experiments with data from a wide range of domains and the empirical results show that R-Transformer outperforms the state-of-the-art methods by a large margin in most of the tasks. We have made the code publicly available at \url{https://github.com/DSE-MSU/R-transformer}.

Code Repositories

DSE-MSU/R-transformer
Official
pytorch
Mentioned in GitHub
sfox14/butterfly-r-transformer
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-penn-treebank-characterR-Transformer
Bit per Character (BPC): 1.24
language-modelling-on-penn-treebank-wordR-Transformer
Test perplexity: 84.38
music-modeling-on-nottinghamTransformer
NLL: 3.34
music-modeling-on-nottinghamR-Transformer
NLL: 2.37
sequential-image-classification-on-sequentialR-Transformer
Unpermuted Accuracy: 99.1%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
R-Transformer: Recurrent Neural Network Enhanced Transformer | Papers | HyperAI