Yinglin ZhengHao YangTing ZhangJianmin BaoDongdong ChenYangyu HuangLu YuanDong ChenMing ZengFang Wen

摘要
如何学习一种通用的人脸表征,以全面提升各类人脸分析任务的性能?本文朝着这一目标迈出了一步。本文系统研究了预训练模型在人脸分析任务上的迁移性能,并提出了一种名为 FaRL(General Facial Representation Learning)的框架,采用视觉-语言联合建模的方式实现通用人脸表征学习。该框架一方面通过对比损失(contrastive loss)从图像-文本对中学习高层次的语义信息;另一方面,为进一步增强人脸表征能力,我们引入了掩码图像建模(masked image modeling)机制,以同时挖掘低层次的视觉信息。我们在 LAION-FACE 数据集上进行预训练,该数据集包含大量人脸图像与文本配对数据,并在多个下游任务上评估了所学表征的性能。实验结果表明,FaRL 在迁移性能上优于以往的预训练模型,尤其在小样本(low-data)场景下展现出显著优势。更重要的是,该模型在多项人脸分析任务(包括人脸分割与人脸对齐)上均超越了当前最先进的方法,验证了其强大的泛化能力与表征质量。
代码仓库
FacePerceiver/FaRL
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| face-alignment-on-300w | FaRL-B (epoch 16) | NME_inter-ocular (%, Challenge): 4.45 NME_inter-ocular (%, Common): 2.56 NME_inter-ocular (%, Full): 2.93 NME_inter-pupil (%, Challenge): 6.42 NME_inter-pupil (%, Common): 3.53 NME_inter-pupil (%, Full): 4.11 |
| face-alignment-on-300w | FaRL-B (epoch 64) | NME_inter-ocular (%, Challenge): 4.42 NME_inter-ocular (%, Common): 2.50 NME_inter-ocular (%, Full): 2.88 NME_inter-pupil (%, Challenge): 6.38 NME_inter-pupil (%, Common): 3.46 NME_inter-pupil (%, Full): 4.05 |
| face-alignment-on-aflw-19 | FaRL-B (epoch 16) | AUC_box@0.07 (%, Full): 81.3 NME_box (%, Full): 1.334 NME_diag (%, Frontal): 0.821 NME_diag (%, Full): 0.943 |
| face-alignment-on-wfw-extra-data | FaRL-B (epoch 16) | AUC@10 (inter-ocular): 61.16 FR@10 (inter-ocular): 1.76 NME (inter-ocular): 3.96 |
| face-parsing-on-celebamask-hq | FaRL-B | Mean F1: 89.56 |
| face-parsing-on-lapa | FaRL-B | Mean F1: 93.88 |