Command Palette
Search for a command to run...
Burbi Giovanni ; Baldrati Alberto ; Agnolucci Lorenzo ; Bertini Marco ; Del Bimbo Alberto

Abstract
Multimodal image-text memes are prevalent on the internet, serving as aunique form of communication that combines visual and textual elements toconvey humor, ideas, or emotions. However, some memes take a malicious turn,promoting hateful content and perpetuating discrimination. Detecting hatefulmemes within this multimodal context is a challenging task that requiresunderstanding the intertwined meaning of text and images. In this work, weaddress this issue by proposing a novel approach named ISSUES for multimodalhateful meme classification. ISSUES leverages a pre-trained CLIPvision-language model and the textual inversion technique to effectivelycapture the multimodal semantic content of the memes. The experiments show thatour method achieves state-of-the-art results on the Hateful Memes Challenge andHarMeme datasets. The code and the pre-trained models are publicly available athttps://github.com/miccunifi/ISSUES.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| hateful-meme-classification-on-harmeme | ISSUES | AUROC: 92.83 Accuracy: 81.64 |
| meme-classification-on-hateful-memes | ISSUES | ROC-AUC: 0.855 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.