Chen Yen-Ju, Sun Zitang, Nishida Shin'ya
Graduate School of Informatics, Kyoto University, Kyoto, Japan.
PLoS Comput Biol. 2025 Sep 11;21(9):e1013001. doi: 10.1371/journal.pcbi.1013001. eCollection 2025 Sep.
Perceptual organization in the human visual system involves neural mechanisms that spatially group and segment image areas based on local feature similarities, such as the temporal correlation of luminance changes. Successful segmentation models in computer vision, including graph-based algorithms and vision transformer, leverage similarity computations across all elements in an image, suggest that effective similarity-based grouping should rely on a global computational process. However, whether human vision employs a similarly global computation remains unclear due to the absence of appropriate methods for manipulating similarity matrices across multiple elements within a stimulus. To investigate how "temporal similarity structures" influence human visual segmentation, we developed a stimulus generation algorithm based on Vision Transformer. This algorithm independently controls within-area and cross-area similarities by adjusting the temporal correlation of luminance, color, and spatial phase attributes. To assess human segmentation performance with these generated texture stimuli, participants completed a temporal two-alternative forced-choice task, identifying which of two intervals contained a segmentable texture. The results showed that segmentation performance is significantly influenced by the configuration of both within- and cross-correlation across the elements, regardless of attribute type. Furthermore, human performance is closely aligned with predictions from a graph-based computational model, suggesting that human texture segmentation can be approximated by a global computational process that optimally integrates pairwise similarities across multiple elements.
人类视觉系统中的感知组织涉及神经机制,该机制基于局部特征相似性(例如亮度变化的时间相关性)在空间上对图像区域进行分组和分割。计算机视觉中的成功分割模型,包括基于图的算法和视觉Transformer,利用图像中所有元素之间的相似性计算,这表明有效的基于相似性的分组应依赖于全局计算过程。然而,由于缺乏用于处理刺激内多个元素之间相似性矩阵的适当方法,人类视觉是否采用类似的全局计算仍不清楚。为了研究“时间相似性结构”如何影响人类视觉分割,我们开发了一种基于视觉Transformer的刺激生成算法。该算法通过调整亮度、颜色和空间相位属性的时间相关性,独立控制区域内和跨区域的相似性。为了评估使用这些生成的纹理刺激时人类的分割性能,参与者完成了一项时间二选一强制选择任务,确定两个时间间隔中的哪一个包含可分割的纹理。结果表明,无论属性类型如何,分割性能都受到元素间内相关性和交叉相关性配置的显著影响。此外,人类的表现与基于图的计算模型的预测密切一致,这表明人类纹理分割可以通过一个全局计算过程来近似,该过程可以最佳地整合多个元素之间的成对相似性。