Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany.
Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
Sci Rep. 2022 May 16;12(1):8043. doi: 10.1038/s41598-022-12027-5.
Large-scale linguistic data is nowadays available in abundance. Using this source of data, previous research has identified redundancies between the statistical structure of natural language and properties of the (physical) world we live in. For example, it has been shown that we can gauge city sizes by analyzing their respective word frequencies in corpora. However, since natural language is always produced by human speakers, we point out that such redundancies can only come about indirectly and should necessarily be restricted cases where human representations largely retain characteristics of the physical world. To demonstrate this, we examine the statistical occurrence of words referring to body parts in very different languages, covering nearly 4 billions of native speakers. This is because the convergence between language and physical properties of the stimuli clearly breaks down for the human body (i.e., more relevant and functional body parts are not necessarily larger in size). Our findings indicate that the human body as extracted from language does not retain its actual physical proportions; instead, it resembles the distorted human-like figure known as the sensory homunculus, whose form depicts the amount of cortical area dedicated to sensorimotor functions of each body part (and, thus, their relative functional relevance). This demonstrates that the surface-level statistical structure of language opens a window into how humans represent the world they live in, rather than into the world itself.
如今,大规模的语言数据已经大量存在。利用这一数据源,先前的研究已经确定了自然语言的统计结构与我们生活的(物理)世界的属性之间存在冗余。例如,已经表明,我们可以通过分析语料库中各城市的词汇频率来衡量城市的规模。然而,由于自然语言总是由人类说话者产生的,我们指出,这种冗余只能间接地产生,并且必然是在人类的表现形式在很大程度上保留了物理世界的特征的情况下才会出现。为了证明这一点,我们检查了来自非常不同语言的指称身体部位的词汇在统计上的出现情况,涵盖了近 40 亿母语使用者。这是因为语言和刺激的物理属性之间的趋同对于人体来说显然是分崩离析的(即,更相关和更有功能的身体部位不一定更大)。我们的研究结果表明,从语言中提取的人体并不保留其实际的物理比例;相反,它类似于被称为感觉同形物的扭曲的人像,其形式描绘了每个身体部位的皮质区域专门用于感觉运动功能的数量(因此,它们的相对功能相关性)。这表明,语言的表面统计结构为我们了解人类如何代表他们生活的世界提供了一个窗口,而不是了解世界本身。