3 months ago

Q8BERT: Quantized 8Bit BERT

Ofir Zafrir Guy Boudoukh Peter Izsak Moshe Wasserblat

Abstract

Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks. However, these models contain a large amount of parameters. The emergence of even larger and more accurate models such as GPT2 and Megatron, suggest a trend of large pre-trained Transformer models. However, using these large models in production environments is a complex task requiring a large amount of compute, memory and power resources. In this work we show how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by $4\times$ with minimal accuracy loss. Furthermore, the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.

Code Repositories

iabd/QuantizedNMT

pytorch

Mentioned in GitHub

mindspore-ai/models/tree/master/official/nlp/q8bert

mindspore

intellabs/model-compression-research-package

Official

pytorch

Mentioned in GitHub

huggingface/block_movement_pruning

pytorch

Mentioned in GitHub

NervanaSystems/nlp-architect/blob/master/nlp_architect/models/transformers/quantized_bert.py

Official

Benchmarks

Benchmark	Methodology	Metrics
linguistic-acceptability-on-cola	Q8BERT (Zafrir et al., 2019)	Accuracy: 65.0
natural-language-inference-on-multinli	Q8BERT (Zafrir et al., 2019)	Matched: 85.6
natural-language-inference-on-qnli	Q8BERT (Zafrir et al., 2019)	Accuracy: 93.0
natural-language-inference-on-rte	Q8BERT (Zafrir et al., 2019)	Accuracy: 84.8
semantic-textual-similarity-on-mrpc	Q8BERT (Zafrir et al., 2019)	Accuracy: 89.7
semantic-textual-similarity-on-sts-benchmark	Q8BERT (Zafrir et al., 2019)	Pearson Correlation: 0.911
sentiment-analysis-on-sst-2-binary	Q8BERT (Zafrir et al., 2019)	Accuracy: 94.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Q8BERT: Quantized 8Bit BERT

Ofir Zafrir Guy Boudoukh Peter Izsak Moshe Wasserblat

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters