Natural Language Inference On Rcb
评估指标
Accuracy
Average F1
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||
|---|---|---|---|---|
| Human Benchmark | 0.702 | 0.68 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
| Golden Transformer | 0.546 | 0.406 | - | - |
| ruRoberta-large finetune | 0.518 | 0.357 | - | - |
| ruBert-base finetune | 0.509 | 0.333 | - | - |
| ruBert-large finetune | 0.5 | 0.356 | - | - |
| ruT5-large-finetune | 0.498 | 0.306 | - | - |
| SBERT_Large_mt_ru_finetuning | 0.486 | 0.351 | - | - |
| RuGPT3Large | 0.484 | 0.417 | - | - |
| RuBERT conversational | 0.484 | 0.452 | - | - |
| majority_class | 0.484 | 0.217 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
| RuGPT3Small | 0.473 | 0.356 | - | - |
| ruT5-base-finetune | 0.468 | 0.307 | - | - |
| RuBERT plain | 0.463 | 0.367 | - | - |
| RuGPT3Medium | 0.461 | 0.372 | - | - |
| MT5 Large | 0.454 | 0.366 | mT5: A massively multilingual pre-trained text-to-text transformer | |
| SBERT_Large | 0.452 | 0.371 | - | - |
| YaLM 1.0B few-shot | 0.447 | 0.408 | - | - |
| Multilingual Bert | 0.445 | 0.367 | - | - |
| Baseline TF-IDF1.1 | 0.441 | 0.301 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
| heuristic majority | 0.438 | 0.4 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
0 of 22 row(s) selected.