HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

Dan Saattrup Nielsen Ryan McConville

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

Abstract

Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system (MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1-score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin-dataset.github.io/, including the data, documentation, tutorials and leaderboards.

Code Repositories

MuMiN-dataset/mumin-trawl
pytorch
Mentioned in GitHub
MuMiN-dataset/mumin-baseline
pytorch
Mentioned in GitHub
MuMiN-dataset/mumin-build
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
node-classification-on-mumin-largeRandom
Claim Classification Macro-F1: 0.3879
Tweet Classification Macro-F1: 0.3690
node-classification-on-mumin-largeHeteroGraphSAGE
Claim Classification Macro-F1: 0.5980
Tweet Classification Macro-F1: 0.6145
node-classification-on-mumin-largeMajority class
Claim Classification Macro-F1: 0.4813
Tweet Classification Macro-F1: 0.4887
node-classification-on-mumin-largeLaBSE
Claim Classification Macro-F1: 0.5790
Tweet Classification Macro-F1: 0.5280
node-classification-on-mumin-mediumHeteroGraphSAGE
Claim Classification Macro-F1: 0.5770
Tweet Classification Macro-F1: 0.5410
node-classification-on-mumin-mediumMajority class
Claim Classification Macro-F1: 0.4806
Tweet Classification Macro-F1: 0.4856
node-classification-on-mumin-mediumRandom
Claim Classification Macro-F1: 0.3896
Tweet Classification Macro-F1: 0.3772
node-classification-on-mumin-mediumLaBSE
Claim Classification Macro-F1: 0.5585
Tweet Classification Macro-F1: 0.5745
node-classification-on-mumin-smallMajority class
Claim Classification Macro-F1: 0.4756
Tweet Classification Macro-F1: 0.4877
node-classification-on-mumin-smallHeteroGraphSAGE
Claim Classification Macro-F1: 0.5795
Tweet Classification Macro-F1: 0.5605
node-classification-on-mumin-smallLaBSE
Claim Classification Macro-F1: 0.6255
Tweet Classification Macro-F1: 0.5450
node-classification-on-mumin-smallRandom
Claim Classification Macro-F1: 0.4007
Tweet Classification Macro-F1: 0.3718

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset | Papers | HyperAI