8 months ago

Computer Vision

Emotion Recognition

Multi-Task Learning

Method/Architecture

Computer Vision

Zhixi Cai Shreya Ghosh Kalin Stefanov Abhinav Dhall Jianfei Cai Hamid Rezatoﬁghi Reza Haffari Munawar Hayat

Abstract

This paper proposes a self-supervised approach to learn universal facialrepresentations from videos, that can transfer across a variety of facialanalysis tasks such as Facial Attribute Recognition (FAR), Facial ExpressionRecognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Ourproposed framework, named MARLIN, is a facial video masked autoencoder, thatlearns highly robust and generic facial embeddings from abundantly availablenon-annotated web crawled facial videos. As a challenging auxiliary task,MARLIN reconstructs the spatio-temporal details of the face from the denselymasked facial regions which mainly include eyes, nose, mouth, lips, and skin tocapture local and global aspects that in turn help in encoding generic andtransferable features. Through a variety of experiments on diverse downstreamtasks, we demonstrate MARLIN to be an excellent facial video encoder as well asfeature extractor, that performs consistently well across a variety ofdownstream tasks including FAR (1.13% gain over supervised benchmark), FER(2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervisedbenchmark), LS (29.36% gain for Frechet Inception Distance), and even in lowdata regime. Our code and models are available athttps://github.com/ControlNet/MARLIN .

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Emotion Recognition

Multi-Task Learning

Method/Architecture

Computer Vision

Zhixi Cai Shreya Ghosh Kalin Stefanov Abhinav Dhall Jianfei Cai Hamid Rezatoﬁghi Reza Haffari Munawar Hayat

Abstract

This paper proposes a self-supervised approach to learn universal facialrepresentations from videos, that can transfer across a variety of facialanalysis tasks such as Facial Attribute Recognition (FAR), Facial ExpressionRecognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Ourproposed framework, named MARLIN, is a facial video masked autoencoder, thatlearns highly robust and generic facial embeddings from abundantly availablenon-annotated web crawled facial videos. As a challenging auxiliary task,MARLIN reconstructs the spatio-temporal details of the face from the denselymasked facial regions which mainly include eyes, nose, mouth, lips, and skin tocapture local and global aspects that in turn help in encoding generic andtransferable features. Through a variety of experiments on diverse downstreamtasks, we demonstrate MARLIN to be an excellent facial video encoder as well asfeature extractor, that performs consistently well across a variety ofdownstream tasks including FAR (1.13% gain over supervised benchmark), FER(2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervisedbenchmark), LS (29.36% gain for Frechet Inception Distance), and even in lowdata regime. Our code and models are available athttps://github.com/ControlNet/MARLIN .

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp