HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Lookahead Optimizer: k steps forward, 1 step back

Michael R. Zhang; James Lucas; Geoffrey Hinton; Jimmy Ba

Lookahead Optimizer: k steps forward, 1 step back

Abstract

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

Code Repositories

alphadl/lookahead.pytorch
pytorch
Mentioned in GitHub
chizhu/BDC2019
tf
Mentioned in GitHub
mnikitin/LookaheadOptimizer-mx
mxnet
Mentioned in GitHub
kpe/params-flow
tf
Mentioned in GitHub
nsarang/lookahead_keras
tf
Mentioned in GitHub
rwightman/pytorch-image-models
pytorch
Mentioned in GitHub
bojone/keras_lookahead
Mentioned in GitHub
nachiket273/lookahead_pytorch
pytorch
Mentioned in GitHub
wkcn/LookaheadOptimizer-mx
mxnet
Mentioned in GitHub
201419/Optimizer-PyTorch
pytorch
Mentioned in GitHub
Abhimanyu08/Lookahead_Optimizer
pytorch
Mentioned in GitHub
HamadYA/GhostFaceNets
tf
Mentioned in GitHub
michaelrzhang/lookahead
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
stochastic-optimization-on-cifar-10-resnet-18Lookahead
Accuracy: 95.27
stochastic-optimization-on-cifar-10-resnet-18SGD
Accuracy: 95.23
stochastic-optimization-on-cifar-10-resnet-18ADAM
Accuracy: 94.84
stochastic-optimization-on-imagenet-resnet-50SGD
Top 5 Accuracy: 92.15%
stochastic-optimization-on-imagenet-resnet-50Lookahead
Top 1 Accuracy: 75.13%
stochastic-optimization-on-imagenet-resnet-50-1Lookahead
Top 1 Accuracy: 75.49%
Top 5 Accuracy: 92.53
stochastic-optimization-on-imagenet-resnet-50-1SGD
Top 1 Accuracy: 75.15%
Top 5 Accuracy: 92.56

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Lookahead Optimizer: k steps forward, 1 step back | Papers | HyperAI