HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Antoine Vanderschueren Christophe De Vleeschouwer

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Abstract

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during training, our work combines soft-thresholding and straight-through gradient estimation to update the raw, i.e. non-thresholded, version of zeroed weights. Our method, named ST-3 for straight-through/soft-thresholding/sparse-training, obtains SoA results, both in terms of accuracy/sparsity and accuracy/FLOPS trade-offs, when progressively increasing the sparsity ratio in a single training cycle. In particular, despite its simplicity, ST-3 favorably compares to the most recent methods, adopting differentiable formulations or bio-inspired neuroregeneration principles. This suggests that the key ingredients for effective sparsification primarily lie in the ability to give the weights the freedom to evolve smoothly across the zero state while progressively increasing the sparsity ratio. Source code and weights available at https://github.com/vanderschuea/stthree

Code Repositories

vanderschuea/stthree
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
network-pruning-on-imagenet-resnet-50-90ST-3
Top-1 Accuracy: 76.03

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp