Common Sense Reasoning On Rucos
评估指标
Average F1
EM
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||
|---|---|---|---|---|
| Human Benchmark | 0.93 | 0.89 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
| Golden Transformer | 0.92 | 0.924 | - | - |
| YaLM 1.0B few-shot | 0.86 | 0.859 | - | - |
| ruT5-large-finetune | 0.81 | 0.764 | - | - |
| ruT5-base-finetune | 0.79 | 0.752 | - | - |
| ruBert-base finetune | 0.74 | 0.716 | - | - |
| ruRoberta-large finetune | 0.73 | 0.716 | - | - |
| ruBert-large finetune | 0.68 | 0.658 | - | - |
| RuGPT3XL few-shot | 0.67 | 0.665 | - | - |
| MT5 Large | 0.57 | 0.562 | mT5: A massively multilingual pre-trained text-to-text transformer | |
| SBERT_Large | 0.36 | 0.351 | - | - |
| SBERT_Large_mt_ru_finetuning | 0.35 | 0.347 | - | - |
| RuBERT plain | 0.32 | 0.314 | - | - |
| Multilingual Bert | 0.29 | 0.29 | - | - |
| Baseline TF-IDF1.1 | 0.26 | 0.252 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
| heuristic majority | 0.26 | 0.257 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
| majority_class | 0.25 | 0.247 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
| Random weighted | 0.25 | 0.247 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
| RuGPT3Medium | 0.23 | 0.224 | - | - |
| RuBERT conversational | 0.22 | 0.218 | - | - |
0 of 22 row(s) selected.