HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

HateBERT: Retraining BERT for Abusive Language Detection in English

Tommaso Caselli Valerio Basile Jelena Mitrović Michael Granitzer

HateBERT: Retraining BERT for Abusive Language Detection in English

Abstract

In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the abuse-inclined version obtained by retraining with posts from the banned communities on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the generic pre-trained language model and its corresponding abusive language-inclined counterpart across the datasets, indicating that portability is affected by compatibility of the annotated phenomena.

Code Repositories

tommasoc80/HateBERT
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
hate-speech-detection-on-abusevalHateBERT
Macro F1: 0.742
hate-speech-detection-on-abusevalBERT
Macro F1: 0.724
hate-speech-detection-on-hatevalBERT
Macro F1: 0.48
hate-speech-detection-on-hatevalHateBERT
Macro F1: 0.494
hate-speech-detection-on-offenseval-2019HateBERT
Macro F1: 0.805
hate-speech-detection-on-offenseval-2019BERT
Macro F1: 0.803

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
HateBERT: Retraining BERT for Abusive Language Detection in English | Papers | HyperAI