IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):1002-1014. doi: 10.1109/TPAMI.2017.2700390. Epub 2017 May 2.
Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.
在监控视频中,人脸经常受到严重的图像模糊、剧烈的姿势变化和遮挡的影响。在本文中,我们提出了一个基于卷积神经网络(CNN)的综合框架,以克服基于视频的人脸识别(VFR)中的挑战。首先,为了学习抗模糊的人脸表示,我们人为地模糊了由清晰的静态图像组成的训练数据,以弥补真实世界视频训练数据的不足。使用由静态图像和人为模糊数据组成的训练数据,鼓励 CNN 自动学习对模糊不敏感的特征。其次,为了增强 CNN 特征对姿势变化和遮挡的鲁棒性,我们提出了一种主干-分支集成 CNN 模型(TBE-CNN),该模型从整体人脸图像和裁剪自面部组件周围的补丁中提取互补信息。TBE-CNN 是一个端到端的模型,通过在主干网络和分支网络之间共享低和中层次的卷积层,有效地提取特征。第三,为了进一步提高 TBE-CNN 学习的表示的判别能力,我们提出了一种改进的三元组损失函数。系统的实验验证了所提出技术的有效性。最令人印象深刻的是,TBE-CNN 在三个流行的视频人脸数据库:PaSC、COX Face 和 YouTube Faces 上实现了最先进的性能。通过所提出的技术,我们还在 BTAS 2016 视频人物识别评估中获得了第一名。