Open Domain Question Answering On Kilt Eli5
评估指标
F1
KILT-F1
KILT-RL
R-Prec
ROUGE-L
Recall@5
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||||
|---|---|---|---|---|---|---|---|---|
| somebody | 27.13 | 3.0 | 2.62 | 10.83 | 24.53 | 27.25 | - | - |
| arxiv.org/abs/2103.06332 | 22.88 | 2.34 | 2.36 | 10.67 | 23.19 | 24.56 | Hurdles to Progress in Long-form Question Answering | |
| Training Set Retrieval (top 1) | 21.62 | 0.0 | 0.0 | 0.0 | 18.66 | 0.0 | - | - |
| BART | 19.23 | 0.0 | 0.0 | 0.0 | 20.55 | 0.0 | - | - |
| BART + DPR | 17.88 | 2.01 | 1.9 | 10.67 | 17.41 | 26.92 | - | - |
| Random Training Set Answer | 17.07 | 0.0 | 0.0 | 0.0 | 15.45 | 0.0 | - | - |
| multi-task small | 16.4 | 0.0 | 0.0 | 0.0 | 17.67 | 0.0 | - | - |
| T5-base | 16.1 | 0.0 | 0.0 | 0.0 | 19.08 | 0.0 | KILT: a Benchmark for Knowledge Intensive Language Tasks | |
| Wikipedia | 15.91 | 2.38 | 2.46 | 14.83 | 16.45 | 27.69 | - | - |
| Sphere | 15.29 | 0.0 | 0.0 | 0.0 | 15.76 | 0.0 | - | - |
| Input Copying | 14.8 | 0.0 | 0.0 | 0.0 | 16.88 | 0.0 | - | - |
| RAG | 14.51 | 1.79 | 1.69 | 11.0 | 14.05 | 22.92 | - | - |
| GENRE | 0.0 | 0.0 | 0.0 | 15.83 | 0.0 | 25.49 | - | - |
| chriskuei | 0.0 | 0.0 | 0.0 | 17.5 | 0.0 | 25.54 | - | - |
| TABi | 0.0 | 0.0 | 0.0 | 18.33 | 0.0 | 28.21 | - | - |
| Multi-task DPR | 0.0 | 0.0 | 0.0 | 15.5 | 0.0 | 27.51 | - | - |
0 of 16 row(s) selected.