Suppr超能文献

双词语义距离作为自然语言中连续语义流的指标:理论、工具和应用。

Bigram semantic distance as an index of continuous semantic flow in natural language: Theory, tools, and applications.

机构信息

Eleanor M. Saffran Center for Cognitive Neuroscience, Temple University.

Technion-Israel Institute of Technology.

出版信息

J Exp Psychol Gen. 2023 Sep;152(9):2578-2590. doi: 10.1037/xge0001389. Epub 2023 Apr 20.

Abstract

Much of our understanding of word meaning has been informed through studies of single words. High-dimensional semantic space models have recently proven instrumental in elucidating connections words. Here we show how bigram semantic distance can yield novel insights into conceptual cohesion and topic flow when computed over continuous language samples. For example, "Cats drink milk" is comprised of an ordered vector of bigrams (cat-drink, drink-milk). Each of these bigrams has a unique semantic distance. These distances in turn may provide a metric of dispersion or the flow of concepts as language unfolds. We offer an R-package ("semdistflow") that transforms any user-specified language transcript into a vector of ordered bigrams, appending two metrics of semantic distance to each pair. We validated these distance metrics on a continuous stream of simulated verbal fluency data assigning predicted switch markers between alternating semantic clusters (animals, musical instruments, fruit). We then generated bigram distance norms on a large sample of text and demonstrated applications of the technique to a classic work of short fiction, (London, 1908). In one application, we showed that bigrams spanning sentence boundaries are punctuated by jumps in the semantic distance. We discuss the promise of this technique for characterizing semantic processing in real-world narratives and for bridging findings at the single word level with macroscale discourse analyses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).

摘要

我们对单词含义的理解在很大程度上是通过对单个单词的研究得到的。高维语义空间模型最近被证明在阐明单词之间的联系方面非常有用。在这里,我们展示了如何在连续的语言样本上计算双词语义距离,从而为概念内聚和主题流动提供新的见解。例如,“猫喝牛奶”由有序的双词向量(cat-drink,drink-milk)组成。这些双词中的每一个都有一个独特的语义距离。这些距离反过来又可以提供一个分散或概念流动的度量,因为语言在展开。我们提供了一个 R 包(“semdistflow”),可以将任何用户指定的语言转录转换为有序双词向量,并为每对双词添加两个语义距离度量。我们在连续的模拟口头流畅性数据流上验证了这些距离度量,为交替语义簇(动物、乐器、水果)之间的预测切换标记分配了这些距离度量。然后,我们在大量文本上生成了双词距离规范,并展示了该技术在经典短篇小说(London,1908)中的应用。在一个应用中,我们表明跨越句子边界的双词被语义距离的跳跃打断。我们讨论了这种技术在描述现实世界叙事中的语义处理以及将单字水平的发现与宏观话语分析联系起来的潜力。(PsycInfo 数据库记录(c)2023 APA,保留所有权利)。

相似文献

6
Tracking word semantic change in biomedical literature.追踪生物医学文献中的词汇语义变化。
Int J Med Inform. 2018 Jan;109:76-86. doi: 10.1016/j.ijmedinf.2017.11.006. Epub 2017 Nov 13.

本文引用的文献

1
Neural evidence of switch processes during semantic and phonetic foraging in human memory.人类记忆中语义和语音觅食过程转换时的神经证据。
Proc Natl Acad Sci U S A. 2023 Oct 17;120(42):e2312462120. doi: 10.1073/pnas.2312462120. Epub 2023 Oct 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验