Department of Psychology, Vanderbilt University, Nashville, TN, USA.
Vanderbilt Vision Research Center, Vanderbilt University, Nashville, TN, USA.
J Vis. 2024 Jun 3;24(6):1. doi: 10.1167/jov.24.6.1.
Computational models of the primary visual cortex (V1) have suggested that V1 neurons behave like Gabor filters followed by simple nonlinearities. However, recent work employing convolutional neural network (CNN) models has suggested that V1 relies on far more nonlinear computations than previously thought. Specifically, unit responses in an intermediate layer of VGG-19 were found to best predict macaque V1 responses to thousands of natural and synthetic images. Here, we evaluated the hypothesis that the poor performance of lower layer units in VGG-19 might be attributable to their small receptive field size rather than to their lack of complexity per se. We compared VGG-19 with AlexNet, which has much larger receptive fields in its lower layers. Whereas the best-performing layer of VGG-19 occurred after seven nonlinear steps, the first convolutional layer of AlexNet best predicted V1 responses. Although the predictive accuracy of VGG-19 was somewhat better than that of standard AlexNet, we found that a modified version of AlexNet could match the performance of VGG-19 after only a few nonlinear computations. Control analyses revealed that decreasing the size of the input images caused the best-performing layer of VGG-19 to shift to a lower layer, consistent with the hypothesis that the relationship between image size and receptive field size can strongly affect model performance. We conducted additional analyses using a Gabor pyramid model to test for nonlinear contributions of normalization and contrast saturation. Overall, our findings suggest that the feedforward responses of V1 neurons can be well explained by assuming only a few nonlinear processing stages.
初级视觉皮层(V1)的计算模型表明,V1 神经元的行为类似于加伯滤波器,其后是简单的非线性。然而,最近采用卷积神经网络(CNN)模型的研究表明,V1 依赖的非线性计算比之前认为的要多得多。具体来说,在 VGG-19 的中间层中发现,单元响应最能预测猕猴 V1 对数千张自然和合成图像的反应。在这里,我们评估了这样一种假设,即 VGG-19 中较低层单元的性能较差可能归因于它们的小感受野大小,而不是它们本身缺乏复杂性。我们比较了 VGG-19 和 AlexNet,AlexNet 在其较低层具有更大的感受野。尽管 VGG-19 表现最好的层发生在七个非线性步骤之后,但 AlexNet 的第一个卷积层最能预测 V1 反应。虽然 VGG-19 的预测准确性略好于标准 AlexNet,但我们发现,AlexNet 的一个修改版本仅经过几个非线性计算就能匹配 VGG-19 的性能。控制分析表明,减小输入图像的大小会导致 VGG-19 表现最佳的层转移到较低的层,这与图像大小和感受野大小之间的关系会强烈影响模型性能的假设一致。我们使用加伯金字塔模型进行了额外的分析,以测试归一化和对比度饱和的非线性贡献。总体而言,我们的研究结果表明,仅假设几个非线性处理阶段就可以很好地解释 V1 神经元的前馈响应。