Open Domain Question Answering On Kilt 2
评估指标
EM
F1
KILT-EM
KILT-F1
R-Prec
Recall@5
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||||
|---|---|---|---|---|---|---|---|---|
| Re2G | 76.27 | 81.4 | 57.91 | 61.78 | 72.68 | 74.23 | Re2G: Retrieve, Rerank, Generate | |
| Sphere | 73.06 | 80.33 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
| Wikipedia | 72.73 | 79.54 | 45.55 | 49.57 | 58.85 | 71.55 | - | - |
| RAG | 71.27 | 75.88 | 38.13 | 40.15 | 48.68 | 57.13 | - | - |
| intersect | 70.86 | 77.29 | 50.56 | 54.99 | 68.36 | 76.36 | - | - |
| BERT + DPR | 70.38 | 74.41 | 34.48 | 36.28 | 43.4 | 31.45 | - | - |
| KGI_0 | 60.99 | 66.55 | 42.85 | 46.08 | 60.49 | 63.54 | - | - |
| Multitask DPR + BART | 59.6 | 66.53 | 42.36 | 46.19 | 61.49 | 68.33 | - | - |
| BART + DPR | 58.55 | 67.79 | 31.4 | 35.34 | 44.49 | 56.99 | - | - |
| BART | 32.39 | 39.85 | 0.0 | 0.0 | 0.0 | 0.0 | - | - |
| T5-base | 18.11 | 27.83 | 0.0 | 0.0 | 0.0 | 0.0 | KILT: a Benchmark for Knowledge Intensive Language Tasks | |
| GENRE | 0.0 | 0.0 | 0.0 | 0.0 | 69.16 | 75.07 | - | - |
| chriskuei | 0.0 | 0.0 | 0.0 | 0.0 | 70.19 | 75.64 | - | - |
| Multi-task DPR | 0.0 | 0.0 | 0.0 | 0.0 | 61.49 | 68.33 | - | - |
| TABi | 0.0 | 0.0 | 0.0 | 0.0 | 70.36 | 69.16 | - | - |
0 of 15 row(s) selected.