Suppr超能文献

使用心理物理学和无监督学习探究物质感知中视觉和语言之间的联系。

Probing the link between vision and language in material perception using psychophysics and unsupervised learning.

机构信息

American University, Department of Neuroscience, Washington DC, United States of America.

The University of Tokyo, Graduate School of Information Science and Technology, Tokyo, Japan.

出版信息

PLoS Comput Biol. 2024 Oct 3;20(10):e1012481. doi: 10.1371/journal.pcbi.1012481. eCollection 2024 Oct.

Abstract

We can visually discriminate and recognize a wide range of materials. Meanwhile, we use language to describe what we see and communicate relevant information about the materials. Here, we investigate the relationship between visual judgment and language expression to understand how visual features relate to semantic representations in human cognition. We use deep generative models to generate images of realistic materials. Interpolating between the generative models enables us to systematically create material appearances in both well-defined and ambiguous categories. Using these stimuli, we compared the representations of materials from two behavioral tasks: visual material similarity judgments and free-form verbal descriptions. Our findings reveal a moderate but significant correlation between vision and language on a categorical level. However, analyzing the representations with an unsupervised alignment method, we discover structural differences that arise at the image-to-image level, especially among ambiguous materials morphed between known categories. Moreover, visual judgments exhibit more individual differences compared to verbal descriptions. Our results show that while verbal descriptions capture material qualities on the coarse level, they may not fully convey the visual nuances of material appearances. Analyzing the image representation of materials obtained from various pre-trained deep neural networks, we find that similarity structures in human visual judgments align more closely with those of the vision-language models than purely vision-based models. Our work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception. Moreover, we propose a novel framework for evaluating the alignment and misalignment between representations from different modalities, leveraging information from human behaviors and computational models.

摘要

我们可以直观地辨别和识别各种材料。同时,我们使用语言来描述我们所看到的内容,并交流有关材料的相关信息。在这里,我们研究了视觉判断和语言表达之间的关系,以了解视觉特征与人认知中的语义表示之间的关系。我们使用深度生成模型来生成逼真材料的图像。在生成模型之间进行插值,使我们能够系统地创建定义明确和模糊类别的材料外观。使用这些刺激物,我们比较了来自两个行为任务的材料表示:视觉材料相似性判断和自由形式的口头描述。我们的发现揭示了在类别水平上视觉和语言之间存在中等但显著的相关性。然而,通过使用无监督对齐方法分析表示,我们发现了在图像到图像级别出现的结构差异,特别是在已知类别之间变形的模糊材料中。此外,视觉判断与口头描述相比表现出更多的个体差异。我们的结果表明,虽然口头描述可以在粗糙的级别上捕捉材料的质量,但它们可能无法完全传达材料外观的视觉细微差别。分析从各种预训练的深度神经网络获得的材料的图像表示,我们发现人类视觉判断中的相似性结构与视觉语言模型更紧密地对齐,而不是与纯粹基于视觉的模型。我们的工作说明了在构建全面的材料感知模型时需要考虑视觉语言关系。此外,我们提出了一种新的框架,用于评估来自不同模态的表示之间的对齐和不对齐,利用来自人类行为和计算模型的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/f8ec7de9bcf7/pcbi.1012481.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验