Common Sense Reasoning On Parus

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Human Benchmark0.982RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Golden Transformer0.908--
YaLM 1.0B few-shot0.766--
RuGPT3XL few-shot0.676--
ruT5-large-finetune0.66--
RuGPT3Medium0.598--
RuGPT3Large0.584--
RuBERT plain0.574--
RuGPT3Small0.562--
ruT5-base-finetune0.554--
Multilingual Bert0.528--
ruRoberta-large finetune0.508--
RuBERT conversational0.508--
MT5 Large0.504mT5: A massively multilingual pre-trained text-to-text transformer
majority_class0.498Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks-
SBERT_Large0.498--
SBERT_Large_mt_ru_finetuning0.498--
ruBert-large finetune0.492--
Baseline TF-IDF1.10.486RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Random weighted0.48Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks-
0 of 22 row(s) selected.
Common Sense Reasoning On Parus | SOTA | HyperAI超神经