3 months ago

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Jianfei Chen Yu Gai Zhewei Yao Michael W. Mahoney Joseph E. Gonzalez

Abstract

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance. Inspired by these theoretical results, we develop two novel gradient quantizers, and we show that these have smaller variance than the existing per-tensor quantizer. For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5% validation accuracy loss relative to QAT, comparable to the existing INT8 baseline.

Code Repositories

cjf00000/StatQuant

Official

pytorch

Mentioned in GitHub

gaochang-bjtu/1-bit-fqt

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
linguistic-acceptability-on-cola	PSQ (Chen et al., 2020)	Accuracy: 67.5
natural-language-inference-on-multinli	PSQ (Chen et al., 2020)	Matched: 89.9
natural-language-inference-on-qnli	PSQ (Chen et al., 2020)	Accuracy: 94.5
natural-language-inference-on-rte	PSQ (Chen et al., 2020)	Accuracy: 86.8
semantic-textual-similarity-on-mrpc	PSQ (Chen et al., 2020)	Accuracy: 90.4
semantic-textual-similarity-on-sts-benchmark	PSQ (Chen et al., 2020)	Pearson Correlation: 0.919
sentiment-analysis-on-sst-2-binary	PSQ (Chen et al., 2020)	Accuracy: 96.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette