3 months ago

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Shiwei Liu Lu Yin Decebal Constantin Mocanu Mykola Pechenizkiy

Abstract

In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization in the space-time manifold, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and indicate that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as there are sufficient parameters that have been reliably explored during training, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, our method achieves dominant performance over the overparameterization-based sparse methods at extreme sparsity levels. When trained on CIFAR-100, our method can match the performance of the dense model even at an extreme sparsity (98%). Code can be found https://github.com/Shiweiliuiiiiiii/In-Time-Over-Parameterization.

Code Repositories

vita-group/granet

pytorch

Mentioned in GitHub

Shiweiliuiiiiiii/In-Time-Over-Parameterization

pytorch

Mentioned in GitHub

Shiweiliuiiiiiii/GraNet

pytorch

Mentioned in GitHub

stevenboys/agent

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
sparse-learning-on-imagenet	Resnet-50: 90% Sparse 100 epochs	Top-1 Accuracy: 73.82
sparse-learning-on-imagenet	Resnet-50: 80% Sparse 100 epochs	Top-1 Accuracy: 75.84

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Shiwei Liu Lu Yin Decebal Constantin Mocanu Mykola Pechenizkiy

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters