4 months ago

Lookahead Optimizer: k steps forward, 1 step back

Michael R. Zhang; James Lucas; Geoffrey Hinton; Jimmy Ba

Abstract

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

Code Repositories

alphadl/lookahead.pytorch

pytorch

Mentioned in GitHub

chizhu/BDC2019

Mentioned in GitHub

mnikitin/LookaheadOptimizer-mx

mxnet

Mentioned in GitHub

kpe/params-flow

Mentioned in GitHub

nsarang/lookahead_keras

Mentioned in GitHub

rwightman/pytorch-image-models

pytorch

Mentioned in GitHub

dseuss/pytorch-lookahead-optimizer

pytorch

bojone/keras_lookahead

Mentioned in GitHub

COMP6248-Reproducability-Challenge/LookaheadOptimizer

pytorch

Mentioned in GitHub

SdahlSean/RangerOptimizerTensorflow

Mentioned in GitHub

allen108108/Model-Optimizer_Implementation

Mentioned in GitHub

nachiket273/lookahead_pytorch

pytorch

Mentioned in GitHub

wkcn/LookaheadOptimizer-mx

mxnet

Mentioned in GitHub

201419/Optimizer-PyTorch

pytorch

Mentioned in GitHub

DreamInvoker/LookaheadOptimizer-mx-implementation

mxnet

Mentioned in GitHub

zhangtj1996/lookahead-sgd-adam-rmsprop-

Mentioned in GitHub

Abhimanyu08/Lookahead_Optimizer

pytorch

Mentioned in GitHub

HamadYA/GhostFaceNets

Mentioned in GitHub

michaelrzhang/lookahead

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
stochastic-optimization-on-cifar-10-resnet-18	Lookahead	Accuracy: 95.27
stochastic-optimization-on-cifar-10-resnet-18	SGD	Accuracy: 95.23
stochastic-optimization-on-cifar-10-resnet-18	ADAM	Accuracy: 94.84
stochastic-optimization-on-imagenet-resnet-50	SGD	Top 5 Accuracy: 92.15%
stochastic-optimization-on-imagenet-resnet-50	Lookahead	Top 1 Accuracy: 75.13%
stochastic-optimization-on-imagenet-resnet-50-1	Lookahead	Top 1 Accuracy: 75.49% Top 5 Accuracy: 92.53
stochastic-optimization-on-imagenet-resnet-50-1	SGD	Top 1 Accuracy: 75.15% Top 5 Accuracy: 92.56

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Lookahead Optimizer: k steps forward, 1 step back

Michael R. Zhang; James Lucas; Geoffrey Hinton; Jimmy Ba

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters