Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
Department of Psychology, Yale University, New Haven, CT, USA.
Sci Adv. 2020 Mar 4;6(10):eaax5979. doi: 10.1126/sciadv.aax5979. eCollection 2020 Mar.
Vision not only detects and recognizes objects, but performs rich inferences about the underlying scene structure that causes the patterns of light we see. Inverting generative models, or "analysis-by-synthesis", presents a possible solution, but its mechanistic implementations have typically been too slow for online perception, and their mapping to neural circuits remains unclear. Here we present a neurally plausible efficient inverse graphics model and test it in the domain of face recognition. The model is based on a deep neural network that learns to invert a three-dimensional face graphics program in a single fast feedforward pass. It explains human behavior qualitatively and quantitatively, including the classic "hollow face" illusion, and it maps directly onto a specialized face-processing circuit in the primate brain. The model fits both behavioral and neural data better than state-of-the-art computer vision models, and suggests an interpretable reverse-engineering account of how the brain transforms images into percepts.
视觉不仅能检测和识别物体,还能对导致我们所看到的光图案的潜在场景结构进行丰富的推断。生成模型的反转,或“分析-综合”,提供了一个可能的解决方案,但它的机械实现通常对于在线感知来说太慢了,并且其映射到神经回路仍然不清楚。在这里,我们提出了一个神经上合理的高效反向图形模型,并在人脸识别领域进行了测试。该模型基于一个深度神经网络,它可以在单个快速前馈过程中学习反转三维人脸图形程序。它从定性和定量两个方面解释了人类的行为,包括经典的“空心脸”错觉,并且它可以直接映射到灵长类动物大脑中的专门面部处理电路上。该模型比最先进的计算机视觉模型更能拟合行为和神经数据,并为大脑如何将图像转化为感知提供了一种可解释的逆向工程解释。