• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用心理物理学和无监督学习探究物质感知中视觉和语言之间的联系。

Probing the link between vision and language in material perception using psychophysics and unsupervised learning.

机构信息

American University, Department of Neuroscience, Washington DC, United States of America.

The University of Tokyo, Graduate School of Information Science and Technology, Tokyo, Japan.

出版信息

PLoS Comput Biol. 2024 Oct 3;20(10):e1012481. doi: 10.1371/journal.pcbi.1012481. eCollection 2024 Oct.

DOI:10.1371/journal.pcbi.1012481
PMID:39361707
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11478833/
Abstract

We can visually discriminate and recognize a wide range of materials. Meanwhile, we use language to describe what we see and communicate relevant information about the materials. Here, we investigate the relationship between visual judgment and language expression to understand how visual features relate to semantic representations in human cognition. We use deep generative models to generate images of realistic materials. Interpolating between the generative models enables us to systematically create material appearances in both well-defined and ambiguous categories. Using these stimuli, we compared the representations of materials from two behavioral tasks: visual material similarity judgments and free-form verbal descriptions. Our findings reveal a moderate but significant correlation between vision and language on a categorical level. However, analyzing the representations with an unsupervised alignment method, we discover structural differences that arise at the image-to-image level, especially among ambiguous materials morphed between known categories. Moreover, visual judgments exhibit more individual differences compared to verbal descriptions. Our results show that while verbal descriptions capture material qualities on the coarse level, they may not fully convey the visual nuances of material appearances. Analyzing the image representation of materials obtained from various pre-trained deep neural networks, we find that similarity structures in human visual judgments align more closely with those of the vision-language models than purely vision-based models. Our work illustrates the need to consider the vision-language relationship in building a comprehensive model for material perception. Moreover, we propose a novel framework for evaluating the alignment and misalignment between representations from different modalities, leveraging information from human behaviors and computational models.

摘要

我们可以直观地辨别和识别各种材料。同时,我们使用语言来描述我们所看到的内容,并交流有关材料的相关信息。在这里,我们研究了视觉判断和语言表达之间的关系,以了解视觉特征与人认知中的语义表示之间的关系。我们使用深度生成模型来生成逼真材料的图像。在生成模型之间进行插值,使我们能够系统地创建定义明确和模糊类别的材料外观。使用这些刺激物,我们比较了来自两个行为任务的材料表示:视觉材料相似性判断和自由形式的口头描述。我们的发现揭示了在类别水平上视觉和语言之间存在中等但显著的相关性。然而,通过使用无监督对齐方法分析表示,我们发现了在图像到图像级别出现的结构差异,特别是在已知类别之间变形的模糊材料中。此外,视觉判断与口头描述相比表现出更多的个体差异。我们的结果表明,虽然口头描述可以在粗糙的级别上捕捉材料的质量,但它们可能无法完全传达材料外观的视觉细微差别。分析从各种预训练的深度神经网络获得的材料的图像表示,我们发现人类视觉判断中的相似性结构与视觉语言模型更紧密地对齐,而不是与纯粹基于视觉的模型。我们的工作说明了在构建全面的材料感知模型时需要考虑视觉语言关系。此外,我们提出了一种新的框架,用于评估来自不同模态的表示之间的对齐和不对齐,利用来自人类行为和计算模型的信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/84357d0733fe/pcbi.1012481.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/f8ec7de9bcf7/pcbi.1012481.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/e4eaaa023df8/pcbi.1012481.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/50ddfff72b42/pcbi.1012481.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/f5440ac0133b/pcbi.1012481.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/8ab9305aca02/pcbi.1012481.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/cc3b09158528/pcbi.1012481.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/8a1904181bca/pcbi.1012481.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/84357d0733fe/pcbi.1012481.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/f8ec7de9bcf7/pcbi.1012481.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/e4eaaa023df8/pcbi.1012481.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/50ddfff72b42/pcbi.1012481.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/f5440ac0133b/pcbi.1012481.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/8ab9305aca02/pcbi.1012481.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/cc3b09158528/pcbi.1012481.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/8a1904181bca/pcbi.1012481.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02e9/11478833/84357d0733fe/pcbi.1012481.g008.jpg

相似文献

1
Probing the link between vision and language in material perception using psychophysics and unsupervised learning.使用心理物理学和无监督学习探究物质感知中视觉和语言之间的联系。
PLoS Comput Biol. 2024 Oct 3;20(10):e1012481. doi: 10.1371/journal.pcbi.1012481. eCollection 2024 Oct.
2
Probing the Link Between Vision and Language in Material Perception Using Psychophysics and Unsupervised Learning.运用心理物理学和无监督学习探究物质感知中视觉与语言之间的联系。
bioRxiv. 2024 May 17:2024.01.25.577219. doi: 10.1101/2024.01.25.577219.
3
Visual features as stepping stones toward semantics: Explaining object similarity in IT and perception with non-negative least squares.作为通向语义学垫脚石的视觉特征:用非负最小二乘法解释信息技术与感知中的物体相似性。
Neuropsychologia. 2016 Mar;83:201-226. doi: 10.1016/j.neuropsychologia.2015.10.023. Epub 2015 Oct 19.
4
Atoms of recognition in human and computer vision.人类视觉与计算机视觉中的识别原子。
Proc Natl Acad Sci U S A. 2016 Mar 8;113(10):2744-9. doi: 10.1073/pnas.1513198113. Epub 2016 Feb 16.
5
Neural Encoding and Decoding With Distributed Sentence Representations.分布式句子表示的神经编码和解码。
IEEE Trans Neural Netw Learn Syst. 2021 Feb;32(2):589-603. doi: 10.1109/TNNLS.2020.3027595. Epub 2021 Feb 4.
6
ViSpa (Vision Spaces): A computer-vision-based representation system for individual images and concept prototypes, with large-scale evaluation.ViSpa(视觉空间):一种基于计算机视觉的个体图像和概念原型表示系统,具有大规模评估。
Psychol Rev. 2023 Jul;130(4):896-934. doi: 10.1037/rev0000392. Epub 2022 Oct 6.
7
Shared representations of human actions across vision and language.人类动作在视觉和语言上的共享表示。
Neuropsychologia. 2024 Sep 9;202:108962. doi: 10.1016/j.neuropsychologia.2024.108962. Epub 2024 Jul 22.
8
An ecologically motivated image dataset for deep learning yields better models of human vision.一个受生态学启发的图像数据集,可用于深度学习,从而更好地模拟人类视觉。
Proc Natl Acad Sci U S A. 2021 Feb 23;118(8). doi: 10.1073/pnas.2011417118.
9
Unsupervised learning predicts human perception and misperception of gloss.无监督学习预测人类对光泽的感知和错觉。
Nat Hum Behav. 2021 Oct;5(10):1402-1417. doi: 10.1038/s41562-021-01097-6. Epub 2021 May 6.
10
What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations.语言与视觉Transformer看到了什么:语义信息对视觉表征的影响。
Front Artif Intell. 2021 Dec 3;4:767971. doi: 10.3389/frai.2021.767971. eCollection 2021.

本文引用的文献

1
Gromov-Wasserstein unsupervised alignment reveals structural correspondences between the color similarity structures of humans and large language models.无监督的 Gromov-Wasserstein 对齐揭示了人类和大型语言模型的颜色相似性结构之间的结构对应关系。
Sci Rep. 2024 Jul 10;14(1):15917. doi: 10.1038/s41598-024-65604-1.
2
Material category of visual objects computed from specular image structure.基于镜面反射图像结构计算得到的视觉对象的材质类别。
Nat Hum Behav. 2023 Jul;7(7):1152-1169. doi: 10.1038/s41562-023-01601-0. Epub 2023 Jun 29.
3
Assessing the representational structure of softness activated by words.
评估由词语激活的柔软感的表现结构。
Sci Rep. 2023 Jun 2;13(1):8974. doi: 10.1038/s41598-023-35169-6.
4
THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior.事物数据集(THINGS-data),一个多模态大型数据集集合,用于研究人类大脑和行为中的目标表示。
Elife. 2023 Feb 27;12:e82580. doi: 10.7554/eLife.82580.
5
Unsupervised learning reveals interpretable latent representations for translucency perception.无监督学习揭示了透明度感知的可解释潜在表示。
PLoS Comput Biol. 2023 Feb 8;19(2):e1010878. doi: 10.1371/journal.pcbi.1010878. eCollection 2023 Feb.
6
Roughness perception: A multisensory/crossmodal perspective.粗糙度感知:多感觉/跨模态视角。
Atten Percept Psychophys. 2022 Oct;84(7):2087-2114. doi: 10.3758/s13414-022-02550-y. Epub 2022 Aug 26.
7
Unsupervised learning of haptic material properties.无监督学习触觉材料特性。
Elife. 2022 Feb 23;11:e64876. doi: 10.7554/eLife.64876.
8
Crystal or jelly? Effect of color on the perception of translucent materials with photographs of real-world objects.水晶还是果冻?颜色对真实物体照片中半透明材料感知的影响。
J Vis. 2022 Feb 1;22(2):6. doi: 10.1167/jov.22.2.6.
9
The look and feel of soft are similar across different softness dimensions.不同柔软度维度的柔软外观和触感相似。
J Vis. 2021 Sep 1;21(10):20. doi: 10.1167/jov.21.10.20.
10
Object representations in the human brain reflect the co-occurrence statistics of vision and language.人类大脑中的物体表示反映了视觉和语言的共现统计数据。
Nat Commun. 2021 Jul 2;12(1):4081. doi: 10.1038/s41467-021-24368-2.