
摘要
社交媒体上的有毒评论检测对于内容审核至关重要。本文在高度偏斜的多标签仇恨言论数据集上比较了多种不同的模型。我们在比较中考虑了推理时间以及多个用于衡量性能和偏差的指标。研究结果表明,无论模型大小、优化程度或预训练所使用的语言如何,所有基于BERT的模型表现相似。与任何一种BERT相比,RNN在推理速度上快得多。BiLSTM在性能和推理时间之间仍是一个良好的折衷方案。采用Focal Loss的RoBERTa在偏差和AUROC(Area Under the Receiver Operating Characteristic Curve)方面表现出最佳性能。然而,DistilBERT不仅具有较高的AUROC,而且推理时间较短。所有模型都受到身份关联偏差的影响,其中BERT、RNN和XLNet对这种偏差的敏感度低于CNN和紧凑卷积变压器(Compact Convolutional Transformers)。
代码仓库
Nigiva/hatespeech-detection-models
官方
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| toxic-comment-classification-on-civil | BiLSTM | GMB Subgroup: 0.8636 Micro F1: 0.5115 Precision: 0.3572 |
| toxic-comment-classification-on-civil | Unfreeze Glove ResNet 44 | AUROC: 0.966 GMB BPSN: 0.8493 GMB Subgroup: 0.8421 Macro F1: 0.4648 Micro F1: 0.5958 Precision: 0.4835 Recall: 0.7759 |
| toxic-comment-classification-on-civil | Compact Convolutional Transformer (CCT) | AUROC: 0.9526 GMB BNSP: 0.9447 GMB BPSN: 0.8307 GMB Subgroup: 0.8133 Macro F1: 0.3428 Micro F1: 0.4874 Precision: 0.3507 Recall: 0.7983 |
| toxic-comment-classification-on-civil | BiGRU | GMB BPSN: 0.8616 |
| toxic-comment-classification-on-civil | Freeze Glove ResNet 44 | GMB BPSN: 0.7876 GMB Subgroup: 0.8219 Macro F1: 0.4189 Micro F1: 0.5591 Precision: 0.4631 Recall: 0.7053 |
| toxic-comment-classification-on-civil | BERTweet | AUROC: 0.979 GMB BNSP: 0.9603 GMB BPSN: 0.8945 GMB Subgroup: 0.878 Macro F1: 0.3612 Micro F1: 0.4928 Precision: 0.3363 Recall: 0.9216 |
| toxic-comment-classification-on-civil | XLNet | GMB BNSP: 0.9597 GMB BPSN: 0.8834 GMB Subgroup: 0.8689 Macro F1: 0.3336 Micro F1: 0.4586 Precision: 0.3045 Recall: 0.9254 |
| toxic-comment-classification-on-civil | XLM RoBERTa | GMB BPSN: 0.8859 Micro F1: 0.468 Precision: 0.3135 Recall: 0.923 |
| toxic-comment-classification-on-civil | DistilBERT | AUROC: 0.9804 GMB BNSP: 0.9644 GMB BPSN: 0.874 GMB Subgroup: 0.8762 Macro F1: 0.3879 Micro F1: 0.5115 Precision: 0.3572 Recall: 0.9001 |
| toxic-comment-classification-on-civil | RoBERTa Focal Loss | AUROC: 0.9818 GMB BNSP: 0.9581 GMB BPSN: 0.901 GMB Subgroup: 0.8807 Macro F1: 0.4648 Micro F1: 0.5524 Precision: 0.4017 Recall: 0.8839 |
| toxic-comment-classification-on-civil | RoBERTa BCE | AUROC: 0.9813 GMB BNSP: 0.9616 GMB BPSN: 0.8901 GMB Subgroup: 0.88 Macro F1: 0.4749 Micro F1: 0.5359 Precision: 0.3836 Recall: 0.8891 |
| toxic-comment-classification-on-civil | Unfreeze Glove ResNet 56 | AUROC: 0.9639 GMB BPSN: 0.8445 GMB Subgroup: 0.8487 Macro F1: 0.3778 Recall: 0.8707 |
| toxic-comment-classification-on-civil | HateBERT | AUROC: 0.9791 GMB BNSP: 0.9589 GMB BPSN: 0.8915 GMB Subgroup: 0.8744 Macro F1: 0.3679 Micro F1: 0.4844 Precision: 0.3297 Recall: 0.9165 |
| toxic-comment-classification-on-civil | AlBERT | AUROC: 0.979 GMB BNSP: 0.9499 GMB BPSN: 0.8982 GMB Subgroup: 0.8734 Macro F1: 0.3541 Micro F1: 0.4845 Precision: 0.3247 Recall: 0.9104 |