
摘要
近日,预训练语言模型BERT(及其经过稳健优化的版本RoBERTa)在自然语言理解(NLU)领域引起了广泛关注,并在情感分类、自然语言推理、语义文本相似性和问答等多种NLU任务中取得了最先进的准确率。受Elman [8]关于线性化探索工作的启发,我们通过将语言结构纳入预训练过程,将BERT扩展为一个新的模型——StructBERT。具体而言,我们使用两个辅助任务对StructBERT进行预训练,以充分利用词汇和句子的顺序性,这两个任务分别利用了词汇级和句子级的语言结构。因此,新模型能够适应下游任务所需的各个层次的语言理解能力。具有结构预训练的StructBERT在多种下游任务上给出了令人惊讶的良好实证结果,包括在GLUE基准测试中达到89.0分(超越所有已发表的模型),SQuAD v1.1问答任务中的F1分数达到93.0,以及SNLI任务中的准确率达到91.7。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| linguistic-acceptability-on-cola | StructBERTRoBERTa ensemble | Accuracy: 69.2% |
| natural-language-inference-on-multinli | Adv-RoBERTa ensemble | Matched: 91.1 Mismatched: 90.7 |
| natural-language-inference-on-qnli | StructBERTRoBERTa ensemble | Accuracy: 99.2% |
| natural-language-inference-on-rte | Adv-RoBERTa ensemble | Accuracy: 88.7% |
| natural-language-inference-on-wnli | StructBERTRoBERTa ensemble | Accuracy: 89.7 |
| paraphrase-identification-on-quora-question | StructBERTRoBERTa ensemble | Accuracy: 90.7 F1: 74.4 |
| paraphrase-identification-on-wikihop | StructBERTRoBERTa ensemble | Accuracy: 90.7% |
| semantic-textual-similarity-on-mrpc | StructBERTRoBERTa ensemble | Accuracy: 91.5% F1: 93.6% |
| semantic-textual-similarity-on-sts-benchmark | StructBERTRoBERTa ensemble | Pearson Correlation: 0.928 Spearman Correlation: 0.924 |
| sentiment-analysis-on-sst-2-binary | StructBERTRoBERTa ensemble | Accuracy: 97.1 |