Wundari Bayu Gautama, Fujita Ichiro, Ban Hiroshi
Graduate School of Frontier Biosciences, Osaka University, Suita, Japan.
Center for Information and Neural Networks (CiNet), Advanced ICT Research Institute, National Institute of Information and Communications Technology, Suita, Japan.
Commun Biol. 2025 Jul 11;8(1):1042. doi: 10.1038/s42003-025-08474-1.
Our visual brain transforms small differences between images in the two eyes (binocular disparity) into coherent depth. Initially, neurons in the primary visual cortex (V1) compute the degrees of overlap between the left and right images to encode disparity. Such cross-correlation-like neurons respond to both binocularly matched and mismatched features. This ambiguous representation is refined along the visual pathway through a cross-matching computation involving additional nonlinear processing to filter out mismatches. How these representations are organized in the human visual cortex remains unclear. Using functional magnetic resonance imaging (fMRI), we show that areas V1-V3 exhibit stronger cross-correlation components, while V3A/B, V7, hV4, and hMT+ are inclined towards cross-matching. A deep neural network (DNN) trained for stereo vision undergoes a similar transformation across its layers, progressing through distinct phases that exploit dissimilar features to achieve coherent depth. This brain-DNN alignment demonstrates that human and artificial visual systems share a computational principle for robust 3D vision.
我们的视觉大脑将双眼图像之间的微小差异(双眼视差)转化为连贯的深度信息。最初,初级视觉皮层(V1)中的神经元通过计算左右图像之间的重叠程度来编码视差。这种类似互相关的神经元对双眼匹配和不匹配的特征都会做出反应。这种模糊的表征会沿着视觉通路通过一种交叉匹配计算得到细化,该计算涉及额外的非线性处理以滤除不匹配的部分。这些表征在人类视觉皮层中是如何组织的仍不清楚。利用功能磁共振成像(fMRI),我们发现V1 - V3区域表现出更强的互相关成分,而V3A/B、V7、hV4和hMT+则倾向于交叉匹配。一个经过立体视觉训练的深度神经网络(DNN)在其各层中也经历了类似的转变,通过利用不同特征的不同阶段来实现连贯的深度。这种大脑与DNN的一致性表明,人类和人工视觉系统在稳健的3D视觉方面共享一种计算原理。