HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Wei Wang; Bin Bi; Ming Yan; Chen Wu; Zuyi Bao; Jiangnan Xia; Liwei Peng; Luo Si

StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

Abstract

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering. Inspired by the linearization exploration work of Elman [8], we extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential order of words and sentences, which leverage language structures at the word and sentence levels, respectively. As a result, the new model is adapted to different levels of language understanding required by downstream tasks. The StructBERT with structural pre-training gives surprisingly good empirical results on a variety of downstream tasks, including pushing the state-of-the-art on the GLUE benchmark to 89.0 (outperforming all published models), the F1 score on SQuAD v1.1 question answering to 93.0, the accuracy on SNLI to 91.7.

Benchmarks

BenchmarkMethodologyMetrics
linguistic-acceptability-on-colaStructBERTRoBERTa ensemble
Accuracy: 69.2%
natural-language-inference-on-multinliAdv-RoBERTa ensemble
Matched: 91.1
Mismatched: 90.7
natural-language-inference-on-qnliStructBERTRoBERTa ensemble
Accuracy: 99.2%
natural-language-inference-on-rteAdv-RoBERTa ensemble
Accuracy: 88.7%
natural-language-inference-on-wnliStructBERTRoBERTa ensemble
Accuracy: 89.7
paraphrase-identification-on-quora-questionStructBERTRoBERTa ensemble
Accuracy: 90.7
F1: 74.4
paraphrase-identification-on-wikihopStructBERTRoBERTa ensemble
Accuracy: 90.7%
semantic-textual-similarity-on-mrpcStructBERTRoBERTa ensemble
Accuracy: 91.5%
F1: 93.6%
semantic-textual-similarity-on-sts-benchmarkStructBERTRoBERTa ensemble
Pearson Correlation: 0.928
Spearman Correlation: 0.924
sentiment-analysis-on-sst-2-binaryStructBERTRoBERTa ensemble
Accuracy: 97.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | Papers | HyperAI