HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

On the adequacy of untuned warmup for adaptive optimization

Jerry Ma Denis Yarats

On the adequacy of untuned warmup for adaptive optimization

Abstract

Adaptive optimization algorithms such as Adam are widely used in deep learning. The stability of such algorithms is often improved with a warmup schedule for the learning rate. Motivated by the difficulty of choosing and tuning warmup schedules, recent work proposes automatic variance rectification of Adam's adaptive learning rate, claiming that this rectified approach ("RAdam") surpasses the vanilla Adam algorithm and reduces the need for expensive tuning of Adam with warmup. In this work, we refute this analysis and provide an alternative explanation for the necessity of warmup based on the magnitude of the update term, which is of greater relevance to training stability. We then provide some "rule-of-thumb" warmup schedules, and we demonstrate that simple untuned warmup of Adam performs more-or-less identically to RAdam in typical practical settings. We conclude by suggesting that practitioners stick to linear warmup with Adam, with a sensible default being linear warmup over $2 / (1 - β_2)$ training iterations.

Code Repositories

Tony-Y/pytorch_warmup
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-imagenetResNet-50
Top 1 Accuracy: 72.1%
language-modelling-on-wikitext-103Transformer (Adaptive inputs)
Validation perplexity: 19.5
machine-translation-on-wmt2016-english-germanTransformer
BLEU score: 26.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
On the adequacy of untuned warmup for adaptive optimization | Papers | HyperAI