
摘要
温格拉德模式挑战(Winograd Schema Challenge, WSC)数据集WSC273及其推理对应数据集WNLI是自然语言理解和常识推理领域的流行基准。在本文中,我们展示了当对类似代词消歧问题的数据集(记为WSCR)进行微调时,三个语言模型在WSC273上的性能显著提升。此外,我们还生成了一个大规模的无监督WSC类数据集。通过在引入的数据集和WSCR数据集上对BERT语言模型进行微调,我们在WSC273和WNLI上分别达到了72.5%和74.7%的整体准确率,比之前的最先进解决方案分别提高了8.8%和9.6%。此外,我们的微调模型在Trichelair等人(2018)引入的WSC273的“复杂”子集上也表现出更加一致的鲁棒性。
代码仓库
vid-koci/bert-commonsense
官方
pytorch
GitHub 中提及
TangJiaLong/Knowledge-Projection-for-ERE
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| coreference-resolution-on-winograd-schema | BERT-base 110M (fine-tuned on WSCR) | Accuracy: 62.3 |
| coreference-resolution-on-winograd-schema | BERTwiki 340M (fine-tuned on WSCR) | Accuracy: 72.5 |
| coreference-resolution-on-winograd-schema | BERT-large 340M (fine-tuned on WSCR) | Accuracy: 71.4 |
| coreference-resolution-on-winograd-schema | BERTwiki 340M (fine-tuned on half of WSCR) | Accuracy: 70.3 |
| natural-language-inference-on-wnli | BERT-large 340M (fine-tuned on WSCR) | Accuracy: 71.9 |
| natural-language-inference-on-wnli | BERTwiki 340M (fine-tuned on WSCR) | Accuracy: 74.7 |
| natural-language-inference-on-wnli | BERT-base 110M (fine-tuned on WSCR) | Accuracy: 70.5 |