
摘要
神经语言表示模型(如BERT)在大规模语料库上预训练后,能够很好地从纯文本中捕捉丰富的语义模式,并通过微调持续提升各种自然语言处理(NLP)任务的性能。然而,现有的预训练语言模型很少考虑融入知识图谱(Knowledge Graphs, KGs),而知识图谱可以提供丰富的结构化知识事实,有助于更好地理解语言。我们认为,知识图谱中的信息实体可以利用外部知识增强语言表示。在本文中,我们同时利用大规模文本语料库和知识图谱来训练一个增强型语言表示模型(Enhanced Representation through Knowledge Integration, ERNIE),该模型能够同时充分利用词汇、句法和知识信息。实验结果表明,ERNIE在各种以知识为驱动的任务上取得了显著的改进,并且在其他常见的NLP任务上与最先进的模型BERT表现相当。本文的源代码可从https://github.com/thunlp/ERNIE 获取。
代码仓库
Mind23-2/MindCode-136
mindspore
thunlp/ERNIE
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| entity-linking-on-figer | ERNIE | Accuracy: 57.19 Macro F1: 76.51 Micro F1: 73.39 |
| entity-typing-on-open-entity | ERNIE | F1: 75.56 Precision: 78.42 Recall: 72.9 |
| linguistic-acceptability-on-cola | ERNIE | Accuracy: 52.3% |
| natural-language-inference-on-multinli | ERNIE | Matched: 84.0 Mismatched: 83.2 |
| natural-language-inference-on-qnli | ERNIE | Accuracy: 91.3% |
| natural-language-inference-on-rte | ERNIE | Accuracy: 68.8% |
| paraphrase-identification-on-quora-question | ERNIE | F1: 71.2 |
| relation-classification-on-tacred-1 | BERT | F1: 66.0 |
| relation-classification-on-tacred-1 | ERNIE | F1: 68.0 |
| relation-extraction-on-fewrel | ERNIE | F1: 88.32 Precision: 88.49 Recall: 88.44 |
| relation-extraction-on-tacred | ERNIE | F1: 67.97 |
| semantic-textual-similarity-on-mrpc | ERNIE | Accuracy: 88.2% |
| semantic-textual-similarity-on-sts-benchmark | ERNIE | Pearson Correlation: 0.832 |
| sentiment-analysis-on-sst-2-binary | ERNIE | Accuracy: 93.5 |