IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):987-1001. doi: 10.1109/TPAMI.2017.2697958. Epub 2017 Apr 25.
Face alignment acts as an important task in computer vision. Regression-based methods currently dominate the approach to solving this problem, which generally employ a series of mapping functions from the face appearance to iteratively update the face shape hypothesis. One keypoint here is thus how to perform the regression procedure. In this work, we formulate this regression procedure as a sparse coding problem. We learn two relational dictionaries, one for the face appearance and the other one for the face shape, with coupled reconstruction coefficient to capture their underlying relationships. To deploy this model for face alignment, we derive the relational dictionaries in a stage-wised manner to perform close-loop refinement of themselves, i.e., the face appearance dictionary is first learned from the face shape dictionary and then used to update the face shape hypothesis, and the updated face shape dictionary from the shape hypothesis is in return used to refine the face appearance dictionary. To improve the model accuracy, we extend this model hierarchically from the whole face shape to face part shapes, thus both the global and local view variations of a face are captured. To locate facial landmarks under occlusions, we further introduce an occlusion dictionary into the face appearance dictionary to recover face shape from partially occluded face appearance. The occlusion dictionary is learned in a data driven manner from background images to represent a set of elemental occlusion patterns, a sparse combination of which models various practical partial face occlusions. By integrating all these technical innovations, we obtain a robust and accurate approach to locate facial landmarks under different face views and possibly severe occlusions for face images in the wild. Extensive experimental analyses and evaluations on different benchmark datasets, as well as two new datasets built by ourselves, have demonstrated the robustness and accuracy of our proposed model, especially for face images with large view variations and/or severe occlusions.
人脸对齐是计算机视觉中的一个重要任务。基于回归的方法目前主导了解决这个问题的方法,通常采用一系列从人脸外观到迭代更新人脸形状假设的映射函数。这里的一个关键点是如何执行回归过程。在这项工作中,我们将这个回归过程表述为稀疏编码问题。我们学习了两个关系字典,一个用于人脸外观,另一个用于人脸形状,它们具有耦合的重建系数,以捕捉它们的潜在关系。为了将这个模型应用于人脸对齐,我们分阶段学习关系字典,以进行自身的闭环细化,即首先从人脸形状字典中学习人脸外观字典,然后使用它来更新人脸形状假设,然后从形状假设更新的人脸形状字典反过来用于细化人脸外观字典。为了提高模型的准确性,我们将这个模型从整体人脸形状层次化地扩展到人脸部分形状,从而捕获了人脸的全局和局部视图变化。为了在遮挡下定位面部地标,我们进一步在人脸外观字典中引入遮挡字典,从部分遮挡的人脸外观中恢复人脸形状。遮挡字典是从背景图像中以数据驱动的方式学习的,用于表示一组基本的遮挡模式,这些模式的稀疏组合可以模拟各种实际的部分人脸遮挡。通过集成所有这些技术创新,我们获得了一种鲁棒且准确的方法,可以在不同的人脸视图下定位面部地标,并且可以处理人脸图像中的严重遮挡。在不同的基准数据集上进行了广泛的实验分析和评估,以及我们自己构建的两个新数据集,证明了我们提出的模型的鲁棒性和准确性,特别是对于具有较大视图变化和/或严重遮挡的人脸图像。