HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Revisiting ResNets: Improved Training and Scaling Strategies

Irwan Bello William Fedus Xianzhi Du Ekin D. Cubuk Aravind Srinivas Tsung-Yi Lin Jonathon Shlens Barret Zoph

Revisiting ResNets: Improved Training and Scaling Strategies

Abstract

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
document-image-classification-on-aipResNet-RS (ResNet-200 + RS training tricks)
Top 1 Accuracy - Verb: 83.4
image-classification-on-imagenetResNet-RS-50 (160 image res)
GFLOPs: 4.6
Hardware Burden:
Number of params: 192M
Operations per network pass:
Top 1 Accuracy: 84.4%
image-classification-on-imagenetResNet-RS-270 (256 image res)
GFLOPs: 54
Top 1 Accuracy: 83.8%
image-classification-on-primaResNet-152 2x (RS training)
Percentage correct: 89.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Revisiting ResNets: Improved Training and Scaling Strategies | Papers | HyperAI