Suppr超能文献

整合经验数据和分布数据以学习语义表示。

Integrating experiential and distributional data to learn semantic representations.

作者信息

Andrews Mark, Vigliocco Gabriella, Vinson David

机构信息

Cognitive, Perceptual and Brain Sciences, University College London, London, UK.

出版信息

Psychol Rev. 2009 Jul;116(3):463-98. doi: 10.1037/a0016261.

Abstract

The authors identify 2 major types of statistical data from which semantic representations can be learned. These are denoted as experiential data and distributional data. Experiential data are derived by way of experience with the physical world and comprise the sensory-motor data obtained through sense receptors. Distributional data, by contrast, describe the statistical distribution of words across spoken and written language. The authors claim that experiential and distributional data represent distinct data types and that each is a nontrivial source of semantic information. Their theoretical proposal is that human semantic representations are derived from an optimal statistical combination of these 2 data types. Using a Bayesian probabilistic model, they demonstrate how word meanings can be learned by treating experiential and distributional data as a single joint distribution and learning the statistical structure that underlies it. The semantic representations that are learned in this manner are measurably more realistic-as verified by comparison to a set of human-based measures of semantic representation-than those available from either data type individually or from both sources independently. This is not a result of merely using quantitatively more data, but rather it is because experiential and distributional data are qualitatively distinct, yet intercorrelated, types of data. The semantic representations that are learned are based on statistical structures that exist both within and between the experiential and distributional data types.

摘要

作者识别出两类主要的统计数据,从中可以学习语义表征。这些数据被称为经验数据和分布数据。经验数据是通过与物理世界的交互获得的,包括通过感官受体获取的感觉运动数据。相比之下,分布数据描述了单词在口语和书面语中的统计分布。作者声称,经验数据和分布数据代表了不同的数据类型,且每一种都是语义信息的重要来源。他们的理论主张是,人类语义表征源自这两种数据类型的最优统计组合。通过使用贝叶斯概率模型,他们展示了如何将经验数据和分布数据视为一个联合分布,并学习其背后的统计结构,从而习得单词的含义。通过与一组基于人类的语义表征度量标准进行比较验证,以这种方式习得的语义表征明显更贴近现实,比单独从任何一种数据类型或从这两种数据类型独立得出的表征都更具现实性。这并非仅仅是使用了数量更多的数据所致,而是因为经验数据和分布数据在性质上截然不同,但又相互关联。所习得的语义表征基于经验数据和分布数据类型内部及之间存在的统计结构。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验