HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Convolutional Sequence Modeling Revisited

{Vladlen Koltun J. Zico Kolter Shaojie Bai}

Convolutional Sequence Modeling Revisited

Abstract

This paper revisits the problem of sequence modeling using convolutional architectures. Although both convolutional and recurrent architectures have along history in sequence prediction, the current "default" mindset in much ofthe deep learning community is that generic sequence modeling is best handledusing recurrent networks. The goal of this paper is to question this assumption. Specifically, we consider a simple generic temporal convolution network (TCN),which adopts features from modern ConvNet architectures such as a dilations and residual connections. We show that on a variety of sequence modeling tasks,including many frequently used as benchmarks for evaluating recurrent networks,the TCN outperforms baseline RNN methods (LSTMs, GRUs, and vanilla RNNs) andsometimes even highly specialized approaches. We further show that thepotential "infinite memory" advantage that RNNs have over TCNs is largelyabsent in practice: TCNs indeed exhibit longer effective history sizes than their recurrent counterparts. As a whole, we argue that it may be time to (re)consider ConvNets as the default "go to" architecture for sequence modeling.

Benchmarks

BenchmarkMethodologyMetrics
language-modelling-on-wikitext-103Temporal CNN
Test perplexity: 45.2
Validation perplexity: -

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Convolutional Sequence Modeling Revisited | Papers | HyperAI