Doshi Fenil R, Konkle Talia, Alvarez George A
Department of Psychology, Harvard University, Cambridge, Massachusetts, United States of America.
Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, Massachusetts, United States of America.
PLoS Comput Biol. 2025 Aug 18;21(8):e1013391. doi: 10.1371/journal.pcbi.1013391. eCollection 2025 Aug.
Deep neural network models provide a powerful experimental platform for exploring core mechanisms underlying human visual perception, such as perceptual grouping and contour integration-the process of linking local edge elements to arrive at a unified perceptual representation of a complete contour. Here, we demonstrate that feedforward convolutional neural networks (CNNs) fine-tuned on contour detection show this human-like capacity, but without relying on mechanisms proposed in prior work, such as lateral connections, recurrence, or top-down feedback. We identified two key properties needed for ImageNet pre-trained, feed-forward models to yield human-like contour integration: first, progressively increasing receptive field structure served as a critical architectural motif to support this capacity; and second, biased fine-tuning for contour-detection specifically for gradual curves (~20 degrees) resulted in human-like sensitivity to curvature. We further demonstrate that fine-tuning ImageNet pretrained models uncovers other hidden human-like capacities in feed-forward networks, including uncrowding (reduced interference from distractors as the number of distractors increases), which is considered a signature of human perceptual grouping. Thus, taken together these results provide a computational existence proof that purely feedforward hierarchical computations are capable of implementing gestalt "good continuation" and perceptual organization needed for human-like contour-integration and uncrowding. More broadly, these results raise the possibility that in human vision, later stages of processing play a more prominent role in perceptual-organization than implied by theories focused on recurrence and early lateral connections.
深度神经网络模型为探索人类视觉感知的核心机制提供了一个强大的实验平台,例如感知分组和轮廓整合——将局部边缘元素连接起来以形成完整轮廓的统一感知表征的过程。在这里,我们证明了在轮廓检测上进行微调的前馈卷积神经网络(CNN)展现出了这种类似人类的能力,但不依赖于先前工作中提出的机制,如侧向连接、循环或自上而下的反馈。我们确定了ImageNet预训练的前馈模型产生类似人类轮廓整合所需的两个关键属性:第一,逐步增加的感受野结构是支持这种能力的关键架构主题;第二,针对渐变曲线(约20度)进行轮廓检测的有偏微调导致了对曲率的类似人类的敏感性。我们进一步证明,对ImageNet预训练模型进行微调揭示了前馈网络中其他类似人类的隐藏能力,包括解拥挤(随着干扰物数量增加,干扰物的干扰减少),这被认为是人类感知分组的一个特征。因此,综合这些结果提供了一个计算上的存在性证明,即纯粹的前馈层次计算能够实现类似人类轮廓整合和解拥挤所需的格式塔“良好延续”和感知组织。更广泛地说,这些结果提出了一种可能性,即在人类视觉中,处理的后期阶段在感知组织中发挥的作用比专注于循环和早期侧向连接的理论所暗示的更为突出。