Psychology Department and Vanderbilt Vision Research Center, Vanderbilt University, Nashville, Tennessee, United States of America.
PLoS Biol. 2021 Dec 9;19(12):e3001418. doi: 10.1371/journal.pbio.3001418. eCollection 2021 Dec.
Deep neural networks (DNNs) for object classification have been argued to provide the most promising model of the visual system, accompanied by claims that they have attained or even surpassed human-level performance. Here, we evaluated whether DNNs provide a viable model of human vision when tested with challenging noisy images of objects, sometimes presented at the very limits of visibility. We show that popular state-of-the-art DNNs perform in a qualitatively different manner than humans-they are unusually susceptible to spatially uncorrelated white noise and less impaired by spatially correlated noise. We implemented a noise training procedure to determine whether noise-trained DNNs exhibit more robust responses that better match human behavioral and neural performance. We found that noise-trained DNNs provide a better qualitative match to human performance; moreover, they reliably predict human recognition thresholds on an image-by-image basis. Functional neuroimaging revealed that noise-trained DNNs provide a better correspondence to the pattern-specific neural representations found in both early visual areas and high-level object areas. A layer-specific analysis of the DNNs indicated that noise training led to broad-ranging modifications throughout the network, with greater benefits of noise robustness accruing in progressively higher layers. Our findings demonstrate that noise-trained DNNs provide a viable model to account for human behavioral and neural responses to objects in challenging noisy viewing conditions. Further, they suggest that robustness to noise may be acquired through a process of visual learning.
用于目标分类的深度神经网络 (DNN) 被认为提供了最有前途的视觉系统模型,并声称它们已经达到甚至超过了人类水平的性能。在这里,我们评估了当使用具有挑战性的嘈杂物体图像进行测试时,DNN 是否为人类视觉提供了可行的模型,这些图像有时在可见度的极限呈现。我们表明,流行的最先进的 DNN 以与人类不同的方式表现-它们对空间上不相关的白噪声异常敏感,而对空间上相关的噪声的干扰较小。我们实施了噪声训练程序,以确定噪声训练的 DNN 是否表现出更稳健的响应,更能匹配人类的行为和神经表现。我们发现,经过噪声训练的 DNN 提供了与人类性能更好的定性匹配;此外,它们可以可靠地预测图像的人类识别阈值。功能神经影像学显示,经过噪声训练的 DNN 与在早期视觉区域和高级物体区域中发现的特定于模式的神经表示更匹配。对 DNN 的层特异性分析表明,噪声训练导致整个网络的广泛修改,在逐渐更高的层中获得了更大的噪声稳健性收益。我们的研究结果表明,经过噪声训练的 DNN 为解释人类在具有挑战性的嘈杂观察条件下对物体的行为和神经反应提供了可行的模型。此外,它们表明对噪声的稳健性可能是通过视觉学习过程获得的。