Department of Electrical and Computer Engineering, The Ohio State University, Columbus, 43210, USA.
IEEE Trans Pattern Anal Mach Intell. 2010 Nov;32(11):2022-38. doi: 10.1109/TPAMI.2010.28.
The appearance-based approach to face detection has seen great advances in the last several years. In this approach, we learn the image statistics describing the texture pattern (appearance) of the object class we want to detect, e.g., the face. However, this approach has had limited success in providing an accurate and detailed description of the internal facial features, i.e., eyes, brows, nose, and mouth. In general, this is due to the limited information carried by the learned statistical model. While the face template is relatively rich in texture, facial features (e.g., eyes, nose, and mouth) do not carry enough discriminative information to tell them apart from all possible background images. We resolve this problem by adding the context information of each facial feature in the design of the statistical model. In the proposed approach, the context information defines the image statistics most correlated with the surroundings of each facial component. This means that when we search for a face or facial feature, we look for those locations which most resemble the feature yet are most dissimilar to its context. This dissimilarity with the context features forces the detector to gravitate toward an accurate estimate of the position of the facial feature. Learning to discriminate between feature and context templates is difficult, however, because the context and the texture of the facial features vary widely under changing expression, pose, and illumination, and may even resemble one another. We address this problem with the use of subclass divisions. We derive two algorithms to automatically divide the training samples of each facial feature into a set of subclasses, each representing a distinct construction of the same facial component (e.g., closed versus open eyes) or its context (e.g., different hairstyles). The first algorithm is based on a discriminant analysis formulation. The second algorithm is an extension of the AdaBoost approach. We provide extensive experimental results using still images and video sequences for a total of 3,930 images. We show that the results are almost as good as those obtained with manual detection.
基于外观的人脸检测方法在过去几年中取得了重大进展。在这种方法中,我们学习描述我们想要检测的对象类(例如人脸)的纹理模式(外观)的图像统计信息。然而,这种方法在提供准确和详细的内部面部特征描述方面取得的成功有限,即眼睛、眉毛、鼻子和嘴巴。一般来说,这是由于学习的统计模型所携带的信息有限。虽然人脸模板在纹理上相对丰富,但面部特征(例如眼睛、鼻子和嘴巴)所携带的区分信息不足以将它们与所有可能的背景图像区分开来。我们通过在统计模型的设计中添加每个面部特征的上下文信息来解决这个问题。在提出的方法中,上下文信息定义了与每个面部组件周围最相关的图像统计信息。这意味着当我们搜索人脸或面部特征时,我们会寻找那些最像特征但与特征上下文最不相似的位置。与上下文特征的这种不相似性迫使检测器趋向于面部特征位置的准确估计。然而,学习区分特征和上下文模板是困难的,因为在表情、姿势和光照变化下,上下文和面部特征的纹理会有很大的差异,甚至可能彼此相似。我们使用子类划分来解决这个问题。我们提出了两种算法,自动将每个面部特征的训练样本划分为一组子类,每个子类代表同一面部组件(例如,闭着的眼睛与睁开的眼睛)或其上下文(例如,不同的发型)的不同构造。第一种算法基于判别分析公式。第二种算法是 AdaBoost 方法的扩展。我们使用静态图像和视频序列提供了广泛的实验结果,共 3930 张图像。我们表明,结果几乎与手动检测获得的结果一样好。