Suppr超能文献

语言与思维中的词意视觉与情感多模态模型。

Visual and Affective Multimodal Models of Word Meaning in Language and Mind.

机构信息

School of Psychological Sciences, University of Melbourne.

School of Psychology, University of New South Wales.

出版信息

Cogn Sci. 2021 Jan;45(1):e12922. doi: 10.1111/cogs.12922.

Abstract

One of the main limitations of natural language-based approaches to meaning is that they do not incorporate multimodal representations the way humans do. In this study, we evaluate how well different kinds of models account for people's representations of both concrete and abstract concepts. The models we compare include unimodal distributional linguistic models as well as multimodal models which combine linguistic with perceptual or affective information. There are two types of linguistic models: those based on text corpora and those derived from word association data. We present two new studies and a reanalysis of a series of previous studies. The studies demonstrate that both visual and affective multimodal models better capture behavior that reflects human representations than unimodal linguistic models. The size of the multimodal advantage depends on the nature of semantic representations involved, and it is especially pronounced for basic-level concepts that belong to the same superordinate category. Additional visual and affective features improve the accuracy of linguistic models based on text corpora more than those based on word associations; this suggests systematic qualitative differences between what information is encoded in natural language versus what information is reflected in word associations. Altogether, our work presents new evidence that multimodal information is important for capturing both abstract and concrete words and that fully representing word meaning requires more than purely linguistic information. Implications for both embodied and distributional views of semantic representation are discussed.

摘要

基于自然语言的意义方法的主要局限性之一是,它们不像人类那样结合多模态表示。在这项研究中,我们评估了不同类型的模型在多大程度上解释了人们对具体和抽象概念的表示。我们比较的模型包括单模态分布语言模型以及将语言与感知或情感信息相结合的多模态模型。语言模型有两种类型:基于文本语料库的模型和基于词联想数据的模型。我们提出了两项新的研究和对一系列先前研究的重新分析。这些研究表明,视觉和情感多模态模型比单模态语言模型更能捕捉反映人类表示的行为。多模态优势的大小取决于所涉及的语义表示的性质,对于属于同一上位范畴的基本级概念,优势尤其明显。额外的视觉和情感特征可以提高基于文本语料库的语言模型的准确性,而不是基于词联想的语言模型;这表明自然语言中编码的信息与词联想中反映的信息之间存在系统的定性差异。总的来说,我们的工作提供了新的证据,表明多模态信息对于捕捉抽象和具体词汇都很重要,并且完全表示词汇的含义需要不仅仅是语言信息。讨论了对语义表示的具身和分布观点的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc92/7816238/bf87061e1046/COGS-45-e12922-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验