Eleanor M. Saffran Center for Cognitive Neuroscience, Temple University.
Technion-Israel Institute of Technology.
J Exp Psychol Gen. 2023 Sep;152(9):2578-2590. doi: 10.1037/xge0001389. Epub 2023 Apr 20.
Much of our understanding of word meaning has been informed through studies of single words. High-dimensional semantic space models have recently proven instrumental in elucidating connections words. Here we show how bigram semantic distance can yield novel insights into conceptual cohesion and topic flow when computed over continuous language samples. For example, "Cats drink milk" is comprised of an ordered vector of bigrams (cat-drink, drink-milk). Each of these bigrams has a unique semantic distance. These distances in turn may provide a metric of dispersion or the flow of concepts as language unfolds. We offer an R-package ("semdistflow") that transforms any user-specified language transcript into a vector of ordered bigrams, appending two metrics of semantic distance to each pair. We validated these distance metrics on a continuous stream of simulated verbal fluency data assigning predicted switch markers between alternating semantic clusters (animals, musical instruments, fruit). We then generated bigram distance norms on a large sample of text and demonstrated applications of the technique to a classic work of short fiction, (London, 1908). In one application, we showed that bigrams spanning sentence boundaries are punctuated by jumps in the semantic distance. We discuss the promise of this technique for characterizing semantic processing in real-world narratives and for bridging findings at the single word level with macroscale discourse analyses. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
我们对单词含义的理解在很大程度上是通过对单个单词的研究得到的。高维语义空间模型最近被证明在阐明单词之间的联系方面非常有用。在这里,我们展示了如何在连续的语言样本上计算双词语义距离,从而为概念内聚和主题流动提供新的见解。例如,“猫喝牛奶”由有序的双词向量(cat-drink,drink-milk)组成。这些双词中的每一个都有一个独特的语义距离。这些距离反过来又可以提供一个分散或概念流动的度量,因为语言在展开。我们提供了一个 R 包(“semdistflow”),可以将任何用户指定的语言转录转换为有序双词向量,并为每对双词添加两个语义距离度量。我们在连续的模拟口头流畅性数据流上验证了这些距离度量,为交替语义簇(动物、乐器、水果)之间的预测切换标记分配了这些距离度量。然后,我们在大量文本上生成了双词距离规范,并展示了该技术在经典短篇小说(London,1908)中的应用。在一个应用中,我们表明跨越句子边界的双词被语义距离的跳跃打断。我们讨论了这种技术在描述现实世界叙事中的语义处理以及将单字水平的发现与宏观话语分析联系起来的潜力。(PsycInfo 数据库记录(c)2023 APA,保留所有权利)。