HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection

Rui Cao Ming Shan Hee Adriel Kuek Wen-Haw Chong Roy Ka-Wei Lee Jing Jiang

Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection

Abstract

Hateful meme detection is a challenging multimodal task that requires comprehension of both vision and language, as well as cross-modal interactions. Recent studies have tried to fine-tune pre-trained vision-language models (PVLMs) for this task. However, with increasing model sizes, it becomes important to leverage powerful PVLMs more efficiently, rather than simply fine-tuning them. Recently, researchers have attempted to convert meme images into textual captions and prompt language models for predictions. This approach has shown good performance but suffers from non-informative image captions. Considering the two factors mentioned above, we propose a probing-based captioning approach to leverage PVLMs in a zero-shot visual question answering (VQA) manner. Specifically, we prompt a frozen PVLM by asking hateful content-related questions and use the answers as image captions (which we call Pro-Cap), so that the captions contain information critical for hateful content detection. The good performance of models with Pro-Cap on three benchmarks validates the effectiveness and generalization of the proposed method.

Code Repositories

social-ai-studio/pro-cap
Official
pytorch
Mentioned in GitHub
abril4416/kgen_vqa
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
meme-classification-on-hateful-memesPro-Cap
Accuracy: 0.723
ROC-AUC: 0.809

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection | Papers | HyperAI