Mathematical Reasoning On Lila Iid
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| Codex (Few-Shot, 175B) | 0.604 | Lila: A Unified Benchmark for Mathematical Reasoning | |
| Bhāskara-P (Fine-tuned, 2.7B) | 0.48 | Lila: A Unified Benchmark for Mathematical Reasoning | |
| Neo-P (Fine-tuned, 2.7B) | 0.394 | Lila: A Unified Benchmark for Mathematical Reasoning | |
| GPT-3 (Few-Shot, 175B) | 0.384 | Lila: A Unified Benchmark for Mathematical Reasoning | |
| Bhāskara-A (Fine-tuned, 2.7B) | 0.252 | Lila: A Unified Benchmark for Mathematical Reasoning | |
| Neo-A (Fine-tuned, 2.7B) | 0.204 | Lila: A Unified Benchmark for Mathematical Reasoning |
0 of 6 row(s) selected.