Open Domain Question Answering On Kilt
评估指标
EM
F1
KILT-EM
KILT-F1
R-Prec
Recall@5
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||||
|---|---|---|---|---|---|---|---|---|
| intersect | 53.74 | 62.24 | 38.78 | 44.4 | 63.16 | 68.19 | - | - |
| Re2G | 51.73 | 60.97 | 43.56 | 49.8 | 70.78 | 76.63 | Re2G: Retrieve, Rerank, Generate | |
| Wikipedia | 51.59 | 60.83 | 35.32 | 40.73 | 59.83 | 71.17 | - | - |
| Sphere | 46.05 | 56.57 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
| KGI_0 | 45.22 | 53.38 | 36.36 | 41.83 | 63.71 | 70.17 | - | - |
| RAG | 44.39 | 52.35 | 32.69 | 37.91 | 59.49 | 67.06 | - | - |
| BART + DPR | 41.27 | 49.54 | 30.06 | 34.72 | 54.29 | 65.52 | - | - |
| Multitask DPR + BART | 39.75 | 48.43 | 29.09 | 34.7 | 59.42 | 68.24 | - | - |
| BERT + DPR | 38.64 | 47.09 | 31.99 | 37.58 | 60.66 | 46.79 | - | - |
| BART | 21.75 | 28.69 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
| T5-base | 19.6 | 27.73 | 0.0 | 0.0 | 0.0 | 0.0 | KILT: a Benchmark for Knowledge Intensive Language Tasks | |
| multi-task small | 0.35 | 3.72 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
| Multi-task DPR | 0.0 | 0.0 | 0.0 | 0.0 | 59.42 | 68.24 | - | - |
| TABi | 0.0 | 0.0 | 0.0 | 0.0 | 62.6 | 64.95 | - | - |
| chriskuei | 0.0 | 0.0 | 0.0 | 0.0 | 60.32 | 61.21 | - | - |
| GENRE | 0.0 | 0.0 | 0.0 | 0.0 | 60.25 | 61.36 | - | - |
0 of 16 row(s) selected.