HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

TinyBERT: Distilling BERT for Natural Language Understanding

Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li Fang Wang Qun Liu

TinyBERT: Distilling BERT for Natural Language Understanding

Abstract

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be effectively transferred to a small student Tiny-BERT. Then, we introduce a new two-stage learning framework for TinyBERT, which performs Transformer distillation at both the pretraining and task-specific learning stages. This framework ensures that TinyBERT can capture he general-domain as well as the task-specific knowledge in BERT. TinyBERT with 4 layers is empirically effective and achieves more than 96.8% the performance of its teacher BERTBASE on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference. TinyBERT with 4 layers is also significantly better than 4-layer state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them. Moreover, TinyBERT with 6 layers performs on-par with its teacher BERTBASE.

Benchmarks

BenchmarkMethodologyMetrics
linguistic-acceptability-on-colaTinyBERT-4 14.5M
Accuracy: 43.3%
linguistic-acceptability-on-cola-devTinyBERT-6 67M
Accuracy: 54
natural-language-inference-on-multinliTinyBERT-6 67M
Matched: 84.6
Mismatched: 83.2
natural-language-inference-on-multinliTinyBERT-4 14.5M
Matched: 82.5
Mismatched: 81.8
natural-language-inference-on-multinli-devTinyBERT-6 67M
Matched: 84.5
Mismatched: 84.5
natural-language-inference-on-qnliTinyBERT-4 14.5M
Accuracy: 87.7%
natural-language-inference-on-qnliTinyBERT-6 67M
Accuracy: 90.4%
natural-language-inference-on-rteTinyBERT-4 14.5M
Accuracy: 62.9%
natural-language-inference-on-rteTinyBERT-6 67M
Accuracy: 66%
paraphrase-identification-on-quora-questionTinyBERT
F1: 71.3
question-answering-on-squad11-devTinyBERT-6 67M
EM: 79.7
F1: 87.5
question-answering-on-squad20-devTinyBERT-6 67M
EM: 69.9
F1: 73.4
semantic-textual-similarity-on-mrpcTinyBERT-6 67M
Accuracy: 87.3%
semantic-textual-similarity-on-mrpcTinyBERT-4 14.5M
Accuracy: 86.4%
semantic-textual-similarity-on-mrpc-devTinyBERT-6 67M
Accuracy: 86.3
semantic-textual-similarity-on-sts-benchmarkTinyBERT-4 14.5M
Pearson Correlation: 0.799
sentiment-analysis-on-sst-2-binaryTinyBERT-6 67M
Accuracy: 93.1
sentiment-analysis-on-sst-2-binaryTinyBERT-4 14.5M
Accuracy: 92.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TinyBERT: Distilling BERT for Natural Language Understanding | Papers | HyperAI