Multi Task Language Understanding On Mgsm
评估指标
Average (%)
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| PaLM 2 (few-shot, k=8, SC) | 87.0 | PaLM 2 Technical Report | |
| PaLM 2 (8-shot, CoT) | 72.2 | PaLM 2 Technical Report | |
| Flan-PaLM 540B (8-shot, fine-tuned, CoT + SC) | 72.0 | Scaling Instruction-Finetuned Language Models | |
| Flan-U-PaLM 540B (CoT) | 60.4 | Scaling Instruction-Finetuned Language Models | |
| Flan-PaLM 540B (8-shot, fine-tuned, CoT) | 57.0 | Scaling Instruction-Finetuned Language Models | |
| PaLM 540B | 55.0 | PaLM: Scaling Language Modeling with Pathways | |
| U-PaLM 540B (CoT) | 49.9 | Transcending Scaling Laws with 0.1% Extra Compute | - | 
| text-davinci-003 | 36 | Scaling Instruction-Finetuned Language Models | |
| code-davinci-002 | 35 | Scaling Instruction-Finetuned Language Models | |
| text-davinci-002 | 23.7 | Scaling Instruction-Finetuned Language Models | |
| Flan-PaLM 540B (8-shot, fine-tuned) | 21.2 | Scaling Instruction-Finetuned Language Models | |
| GPT-3 Davinci 175B | 5.7 | Scaling Instruction-Finetuned Language Models | 
0 of 12 row(s) selected.