Qihuang ZhongLiang DingYibing ZhanYu QiaoYonggang WenLi ShenJuhua LiuBaosheng YuBo DuYixin ChenXinbo GaoChunyan MiaoXiaoou TangDacheng Tao

摘要
本技术报告简要介绍了我团队JDExplore Vega v2在SuperGLUE排行榜上的提交成果。SuperGLUE相较于广泛使用的通用语言理解评估基准GLUE更具挑战性,包含八个高难度的语言理解任务,涵盖问答、自然语言推理、词义消歧、共指消解以及推理等能力评估。【方法】与盲目扩大预训练语言模型(PLM)规模的做法不同,我们的目标是:1)在给定参数预算(如60亿参数)的前提下,充分挖掘输入预训练数据中的知识;2)高效地将所提取的知识迁移到下游任务中。为实现目标1),我们提出了一种面向PLM的自进化学习(self-evolution learning)方法,通过智能预测应被掩码的语义信息丰富的词元(tokens),并采用修正平滑标签(rectified smooth labels)对掩码语言建模(MLM)过程进行监督,从而提升模型对关键语义信息的捕捉能力。为实现目标2),我们引入提示迁移(prompt transfer)技术,通过将基础模型及相关下游任务的知识迁移到目标任务,显著提升低资源场景下的性能表现。【结果】根据我们的提交记录(2022年10月),在优化的预训练与微调策略支持下,我们提出的60亿参数规模的Vega方法在SuperGLUE的8项任务中取得了4项新最优成绩。2022年10月8日,该方法以平均得分91.3位居SuperGLUE排行榜首位,刷新了当时的世界领先水平。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| common-sense-reasoning-on-record | Vega v2 6B (fine-tuned) | EM: 93.9 F1: 94.4 |
| common-sense-reasoning-on-record | Turing NLR v5 XXL 5.4B (fine-tuned) | EM: 95.9 F1: 96.4 |
| coreference-resolution-on-winograd-schema | Turing NLR v5 XXL 5.4B (fine-tuned) | Accuracy: 97.3 |
| coreference-resolution-on-winograd-schema | Vega v2 6B (KD-based prompt transfer) | Accuracy: 98.6 |
| natural-language-inference-on-commitmentbank | Turing NLR v5 XXL 5.4B (fine-tuned) | Accuracy: 97.6 F1: 95.9 |
| natural-language-inference-on-commitmentbank | Vega v2 6B (KD-based prompt transfer) | Accuracy: 99.2 F1: 98.6 |
| natural-language-inference-on-rte | Vega v2 6B (KD-based prompt transfer) | Accuracy: 96% |
| natural-language-inference-on-rte | Turing NLR v5 XXL 5.4B (fine-tuned) | Accuracy: 94.1% |
| question-answering-on-boolq | Vega v2 6B (fine-tuned) | Accuracy: 90.5 |
| question-answering-on-boolq | Turing NLR v5 XXL 5.4B (fine-tuned) | Accuracy: 92 |
| question-answering-on-copa | Vega v2 6B (KD-based prompt transfer) | Accuracy: 99.4 |
| question-answering-on-copa | Turing NLR v5 XXL 5.4B (fine-tuned) | Accuracy: 98.2 |
| question-answering-on-multirc | Turing NLR v5 XXL 5.4B (fine-tuned) | EM: 63 F1: 88.4 |
| question-answering-on-multirc | Vega v2 6B (fine-tuned) | EM: 62.4 F1: 88.2 |
| word-sense-disambiguation-on-words-in-context | Vega v2 6B (fine-tuned) | Accuracy: 77.4 |
| word-sense-disambiguation-on-words-in-context | Turing NLR v5 XXL 5.4B (fine-tuned) | Accuracy: 77.1 |