Department of Electronics and Information Engineering, North China University of Technology, Beijing 100144, China.
Department of Software, Beihang University, Beijing 100191, China.
Sensors (Basel). 2018 Jun 11;18(6):1906. doi: 10.3390/s18061906.
Face recognition/verification has received great attention in both theory and application for the past two decades. Deep learning has been considered as a very powerful tool for improving the performance of face recognition/verification recently. With large labeled training datasets, the features obtained from deep learning networks can achieve higher accuracy in comparison with shallow networks. However, many reported face recognition/verification approaches rely heavily on the large size and complete representative of the training set, and most of them tend to suffer serious performance drop or even fail to work if fewer training samples per person are available. Hence, the small number of training samples may cause the deep features to vary greatly. We aim to solve this critical problem in this paper. Inspired by recent research in scene domain transfer, for a given face image, a new series of possible scenarios about this face can be deduced from the scene semantics extracted from other face individuals in a face dataset. We believe that the "scene" or background in an image, that is, samples with more different scenes for a given person, may determine the intrinsic features among the faces of the same individual. In order to validate this belief, we propose a Bayesian scene-prior-based deep learning model in this paper with the aim to extract important features from background scenes. By learning a scene model on the basis of a labeled face dataset via the Bayesian idea, the proposed method transforms a face image into new face images by referring to the given face with the learnt scene dictionary. Because the new derived faces may have similar scenes to the input face, the face-verification performance can be improved without having background variance, while the number of training samples is significantly reduced. Experiments conducted on the Labeled Faces in the Wild (LFW) dataset view #2 subset illustrated that this model can increase the verification accuracy to 99.2% by means of scenes' transfer learning (99.12% in literature with an unsupervised protocol). Meanwhile, our model can achieve 94.3% accuracy for the YouTube Faces database (DB) (93.2% in literature with an unsupervised protocol).
在过去的二十年中,人脸识别/验证在理论和应用方面都受到了极大的关注。深度学习最近被认为是提高人脸识别/验证性能的非常强大的工具。有了大型标记训练数据集,从深度学习网络中获得的特征可以比浅层网络获得更高的准确性。然而,许多已报道的人脸识别/验证方法严重依赖于训练集的大小和完整性,并且大多数方法如果每个人的训练样本较少,性能就会严重下降甚至无法工作。因此,少量的训练样本可能会导致深度特征发生很大的变化。我们旨在解决本文中的这个关键问题。受最近场景域转换研究的启发,对于给定的人脸图像,可以从人脸数据集中从其他人脸个体中提取的场景语义中推断出关于该人脸的一系列可能场景。我们相信,图像中的“场景”或背景,即给定人具有更多不同场景的样本,可能决定了同一个体的人脸之间的内在特征。为了验证这一信念,我们在本文中提出了一种基于贝叶斯场景先验的深度学习模型,旨在从背景场景中提取重要特征。通过基于贝叶斯思想在标记人脸数据集上学习场景模型,该方法通过参考所学到的场景字典将人脸图像转换为新的人脸图像。由于新派生的人脸可能与输入人脸具有相似的场景,因此可以在没有背景变化的情况下提高人脸验证性能,同时显著减少训练样本的数量。在 Labeled Faces in the Wild (LFW) 数据集 view #2 子集上进行的实验表明,该模型可以通过场景迁移学习将验证准确率提高到 99.2%(文献中使用无监督协议为 99.12%)。同时,我们的模型可以实现 94.3%的 YouTube Faces 数据库(DB)准确率(文献中使用无监督协议为 93.2%)。