Department of Psychology, University of Cambridge, Cambridge CB2 3EB, United Kingdom.
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, United Kingdom.
Proc Natl Acad Sci U S A. 2022 Jul 5;119(27):e2115047119. doi: 10.1073/pnas.2115047119. Epub 2022 Jun 29.
Human vision is attuned to the subtle differences between individual faces. Yet we lack a quantitative way of predicting how similar two face images look and whether they appear to show the same person. Principal component-based three-dimensional (3D) morphable models are widely used to generate stimuli in face perception research. These models capture the distribution of real human faces in terms of dimensions of physical shape and texture. How well does a "face space" based on these dimensions capture the similarity relationships humans perceive among faces? To answer this, we designed a behavioral task to collect dissimilarity and same/different identity judgments for 232 pairs of realistic faces. Stimuli sampled geometric relationships in a face space derived from principal components of 3D shape and texture (Basel face model [BFM]). We then compared a wide range of models in their ability to predict the data, including the BFM from which faces were generated, an active appearance model derived from face photographs, and image-computable models of visual perception. Euclidean distance in the BFM explained both dissimilarity and identity judgments surprisingly well. In a comparison against 16 diverse models, BFM distance was competitive with representational distances in state-of-the-art deep neural networks (DNNs), including novel DNNs trained on BFM synthetic identities or BFM latents. Models capturing the distribution of face shape and texture across individuals are not only useful tools for stimulus generation. They also capture important information about how faces are perceived, suggesting that human face representations are tuned to the statistical distribution of faces.
人类的视觉能够敏锐地察觉出个体面部之间的细微差异。然而,我们缺乏一种定量的方法来预测两张人脸图像的相似度,以及它们是否看起来是同一个人。基于主成分的三维(3D)可变形模型被广泛应用于面部感知研究中的刺激生成。这些模型从物理形状和纹理的维度上捕捉到真实人类面部的分布。基于这些维度的“人脸空间”在多大程度上能捕捉到人类感知到的人脸相似性关系?为了回答这个问题,我们设计了一个行为任务,收集了 232 对真实人脸的不相似性和相同/不同身份判断。刺激从 3D 形状和纹理的主成分(巴塞尔人脸模型 [BFM])得出的人脸空间中采样几何关系。然后,我们比较了多种模型在预测数据方面的能力,包括生成人脸的 BFM、从人脸照片中得出的主动外观模型,以及视觉感知的图像可计算模型。BFM 中的欧几里得距离出人意料地很好地解释了不相似性和身份判断。在与 16 种不同模型的比较中,BFM 距离与最先进的深度神经网络(DNN)中的表示距离具有竞争力,包括基于 BFM 合成身份或 BFM 潜在变量训练的新型 DNN。捕获个体间人脸形状和纹理分布的模型不仅是刺激生成的有用工具。它们还捕获了有关人脸感知的重要信息,表明人类的面部表示是针对人脸的统计分布进行调整的。