HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Yuan Kaishen ; Yu Zitong ; Liu Xin ; Xie Weicheng ; Yue Huanjing ; Yang Jingyu

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit
  Detectors

Abstract

Facial Action Units (AU) is a vital concept in the realm of affectivecomputing, and AU detection has always been a hot research topic. Existingmethods suffer from overfitting issues due to the utilization of a large numberof learnable parameters on scarce AU-annotated datasets or heavy reliance onsubstantial additional relevant data. Parameter-Efficient Transfer Learning(PETL) provides a promising paradigm to address these challenges, whereas itsexisting methods lack design for AU characteristics. Therefore, we innovativelyinvestigate PETL paradigm to AU detection, introducing AUFormer and proposing anovel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism. An individualMoKE specific to a certain AU with minimal learnable parameters firstintegrates personalized multi-scale and correlation knowledge. Then the MoKEcollaborates with other MoKEs in the expert group to obtain aggregatedinformation and inject it into the frozen Vision Transformer (ViT) to achieveparameter-efficient AU detection. Additionally, we design a Margin-truncatedDifficulty-aware Weighted Asymmetric Loss (MDWA-Loss), which can encourage themodel to focus more on activated AUs, differentiate the difficulty ofunactivated AUs, and discard potential mislabeled samples. Extensiveexperiments from various perspectives, including within-domain, cross-domain,data efficiency, and micro-expression domain, demonstrate AUFormer'sstate-of-the-art performance and robust generalization abilities withoutrelying on additional relevant data. The code for AUFormer is available athttps://github.com/yuankaishen2001/AUFormer.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
facial-action-unit-detection-on-bp4dAUFormer
Average F1: 66.2
facial-action-unit-detection-on-disfaAUFormer
Average F1: 66.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors | Papers | HyperAI