School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas 75080, USA; email:
Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA; email:
Annu Rev Vis Sci. 2021 Sep 15;7:543-570. doi: 10.1146/annurev-vision-093019-111701. Epub 2021 Aug 4.
Deep learning models currently achieve human levels of performance on real-world face recognition tasks. We review scientific progress in understanding human face processing using computational approaches based on deep learning. This review is organized around three fundamental advances. First, deep networks trained for face identification generate a representation that retains structured information about the face (e.g., identity, demographics, appearance, social traits, expression) and the input image (e.g., viewpoint, illumination). This forces us to rethink the universe of possible solutions to the problem of inverse optics in vision. Second, deep learning models indicate that high-level visual representations of faces cannot be understood in terms of interpretable features. This has implications for understanding neural tuning and population coding in the high-level visual cortex. Third, learning in deep networks is a multistep process that forces theoretical consideration of diverse categories of learning that can overlap, accumulate over time, and interact. Diverse learning types are needed to model the development of human face processing skills, cross-race effects, and familiarity with individual faces.
深度学习模型目前在实际的人脸识别任务中达到了人类的水平。我们回顾了使用基于深度学习的计算方法理解人类面部处理的科学进展。本综述围绕三个基本进展展开。首先,经过身份识别训练的深度网络生成的表示保留了关于人脸的结构化信息(例如,身份、人口统计学、外观、社会特征、表情)和输入图像(例如,视角、光照)。这迫使我们重新思考视觉逆光学问题的可能解决方案的范围。其次,深度学习模型表明,无法根据可解释的特征来理解人脸的高级视觉表示。这对理解高级视觉皮层中的神经调谐和群体编码有影响。第三,深度网络中的学习是一个多步骤的过程,这迫使我们从理论上考虑多种可以重叠、随时间积累和相互作用的学习类型。需要不同的学习类型来模拟人类面部处理技能的发展、跨种族效应和对个体面孔的熟悉程度。