Common Sense Reasoning On Rucos

评估指标

Average F1
EM

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Human Benchmark0.930.89RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Golden Transformer0.920.924--
YaLM 1.0B few-shot0.860.859--
ruT5-large-finetune0.810.764--
ruT5-base-finetune0.790.752--
ruBert-base finetune0.740.716--
ruRoberta-large finetune0.730.716--
ruBert-large finetune0.680.658--
RuGPT3XL few-shot0.670.665--
MT5 Large0.570.562mT5: A massively multilingual pre-trained text-to-text transformer
SBERT_Large0.360.351--
SBERT_Large_mt_ru_finetuning0.350.347--
RuBERT plain0.320.314--
Multilingual Bert0.290.29--
Baseline TF-IDF1.10.260.252RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
heuristic majority0.260.257Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks-
majority_class0.250.247Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks-
Random weighted0.250.247Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks-
RuGPT3Medium0.230.224--
RuBERT conversational0.220.218--
0 of 22 row(s) selected.
Common Sense Reasoning On Rucos | SOTA | HyperAI超神经