Question Answering On Narrativeqa
评估指标
BLEU-1
BLEU-4
METEOR
Rouge-L
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||
|---|---|---|---|---|---|---|
| Oracle IR Models | 54.60/55.55 | 26.71/27.78 | - | - | The NarrativeQA Reading Comprehension Challenge | |
| Masque (NarrativeQA + MS MARCO) | 54.11 | 30.43 | 26.13 | 59.87 | Multi-style Generative Reading Comprehension | - |
| Masque (NarrativeQA only) | 48.7 | 20.98 | 21.95 | 54.74 | Multi-style Generative Reading Comprehension | - |
| DecaProp | 44.35 | 27.61 | 21.80 | 44.69 | Densely Connected Attention Propagation for Reading Comprehension | |
| MHPGM + NOIC | 43.63 | 21.07 | 19.03 | 44.16 | Commonsense for Generative Multi-Hop Question Answering Tasks | |
| ConZNet | 42.76 | 22.49 | 19.24 | 46.67 | Cut to the Chase: A Context Zoom-in Network for Reading Comprehension | - |
| BiAttention + DCU-LSTM | 36.55 | 19.79 | 17.87 | 41.44 | Multi-Granular Sequence Encoding via Dilated Compositional Units for Reading Comprehension | - |
| FiD+Distil | 35.3 | 7.5 | 11.1 | 32 | Distilling Knowledge from Reader to Retriever for Question Answering | |
| BiDAF | 33.45 | 15.69 | 15.68 | 36.74 | Bidirectional Attention Flow for Machine Comprehension | |
| BERT-QA with Hard EM objective | - | - | - | 58.8 | A Discrete Hard EM Approach for Weakly Supervised Question Answering |
0 of 10 row(s) selected.