Miao Hui-Yuan, Tong Frank
Department of Psychology, Vanderbilt University, Nashville, TN, 37240, USA.
Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, 37240, USA.
bioRxiv. 2023 Aug 28:2023.08.26.554952. doi: 10.1101/2023.08.26.554952.
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple non-linearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more non-linear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower-layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity . We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven non-linear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although VGG-19's predictive accuracy was somewhat better than standard AlexNet, we found that a modified version of AlexNet could match VGG-19's performance after only a few non-linear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for non-linear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few non-linear processing stages.
初级视觉皮层(V1)的计算模型表明,V1神经元的行为类似于经过简单非线性变换的Gabor滤波器。然而,最近使用卷积神经网络(CNN)模型的研究表明,V1所依赖的非线性计算比之前认为的要多得多。具体而言,研究发现VGG - 19中间层的单元响应能够最好地预测猕猴V1对数千张自然图像和合成图像的响应。在此,我们评估了这样一个假设,即VGG - 19中较低层单元性能不佳可能是由于其感受野尺寸较小,而非缺乏复杂性。我们将VGG - 19与AlexNet进行了比较,AlexNet在其较低层具有大得多的感受野。VGG - 19表现最佳的层出现在七个非线性步骤之后,而AlexNet的第一个卷积层则能最好地预测V1的响应。尽管VGG - 19的预测准确率略高于标准AlexNet,但我们发现,经过修改的AlexNet版本在仅经过几次非线性计算后就能达到VGG - 19的性能。对照分析表明,减小输入图像的尺寸会使VGG - 19表现最佳的层转移到更低层,这与图像尺寸和感受野尺寸之间的关系会强烈影响模型性能这一假设相符。我们使用Gabor金字塔模型进行了额外分析,以测试归一化和对比度饱和度的非线性贡献。总体而言,我们的数据表明,仅假设少数几个非线性处理阶段就能很好地解释V1神经元的前馈响应。