HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Joint Audio-Visual Deepfake Detection

{Ser-Nam Lim Yipin Zhou}

Joint Audio-Visual Deepfake Detection

Abstract

Deepfakes ("deep learning" + "fake") are synthetically-generated videos from AI algorithms. While they could be entertaining, they could also be misused for falsifying speeches and spreading misinformation. The process to create deepfakes involves both visual and auditory manipulations. Exploration on detecting visual deepfakes has produced a number of detection methods as well as datasets, while audio deepfakes (e.g. synthetic speech from text-to-speech or voice conversion systems) and the relationship between the visual and auditory modalities have been relatively neglected. In this work, we propose a novel visual / auditory deepfake joint detection task and show that exploiting the intrinsic synchronization between the visual and auditory modalities could benefit deepfake detection. Experiments demonstrate that the proposed joint detection framework outperforms independently trained models, and at the same time, yields superior generalization capability on unseen types of deepfakes.

Benchmarks

BenchmarkMethodologyMetrics
deepfake-detection-on-fakeavceleb-1AD DFD
AP: 88.8
ROC AUC: 88.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Joint Audio-Visual Deepfake Detection | Papers | HyperAI