Wu Yue, Hassner Tal, Kim KangGeon, Medioni Gerard, Natarajan Prem
IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):3067-3074. doi: 10.1109/TPAMI.2017.2787130. Epub 2017 Dec 25.
This paper concerns the problem of facial landmark detection. We provide a unique new analysis of the features produced at intermediate layers of a convolutional neural network (CNN) trained to regress facial landmark coordinates. This analysis shows that while being processed by the CNN, face images can be partitioned in an unsupervised manner into subsets containing faces in similar poses (i.e., 3D views) and facial properties (e.g., presence or absence of eye-wear). Based on this finding, we describe a novel CNN architecture, specialized to regress the facial landmark coordinates of faces in specific poses and appearances. To address the shortage of training data, particularly in extreme profile poses, we additionally present data augmentation techniques designed to provide sufficient training examples for each of these specialized sub-networks. The proposed Tweaked CNN (TCNN) architecture is shown to outperform existing landmark detection methods in an extensive battery of tests on the AFW, ALFW, and 300W benchmarks. Finally, to promote reproducibility of our results, we make code and trained models publicly available through our project webpage.
本文关注面部地标检测问题。我们对经过训练以回归面部地标坐标的卷积神经网络(CNN)中间层产生的特征进行了独特的新分析。该分析表明,在由CNN处理时,面部图像可以以无监督的方式被划分为包含相似姿势(即3D视图)和面部属性(例如是否佩戴眼镜)的面部的子集。基于这一发现,我们描述了一种新颖的CNN架构,专门用于回归特定姿势和外观的面部的面部地标坐标。为了解决训练数据不足的问题,特别是在极端侧脸姿势下,我们还提出了数据增强技术,旨在为这些专门的子网中的每一个提供足够的训练示例。在AFW、ALFW和300W基准的大量测试中,所提出的微调CNN(TCNN)架构表现优于现有的地标检测方法。最后,为了提高我们结果的可重复性,我们通过项目网页公开提供代码和训练模型。