Department of Psychology, Kavli Institute for Brain and Mind, UC San Diego, La Jolla, CA 92093, USA.
Center for Academic Research & Training in Anthropogeny, Kavli Institute for Brain and Mind, UC San Diego, La Jolla, CA 92093, USA.
Proc Biol Sci. 2022 Mar 9;289(1970):20212657. doi: 10.1098/rspb.2021.2657.
To convey meaning, human language relies on hierarchically organized, long-range relationships spanning words, phrases, sentences and discourse. As the distances between elements (e.g. phonemes, characters, words) in human language sequences increase, the strength of the long-range relationships between those elements decays following a power law. This power-law relationship has been attributed variously to long-range sequential organization present in human language syntax, semantics and discourse structure. However, non-linguistic behaviours in numerous phylogenetically distant species, ranging from humpback whale song to fruit fly motility, also demonstrate similar long-range statistical dependencies. Therefore, we hypothesized that long-range statistical dependencies in human speech may occur independently of linguistic structure. To test this hypothesis, we measured long-range dependencies in several speech corpora from children (aged 6 months-12 years). We find that adult-like power-law statistical dependencies are present in human vocalizations at the earliest detectable ages, prior to the production of complex linguistic structure. These linguistic structures cannot, therefore, be the sole cause of long-range statistical dependencies in language.
为了传达意义,人类语言依赖于层次化的、远距离的关系,跨越单词、短语、句子和语篇。随着人类语言序列中元素(如音素、字符、单词)之间的距离增加,这些元素之间的远距离关系的强度遵循幂律衰减。这种幂律关系归因于人类语言句法、语义和语篇结构中的远距离序列组织。然而,众多在系统发育上相距甚远的物种的非语言行为,从座头鲸的歌声到果蝇的运动,也表现出类似的远距离统计相关性。因此,我们假设人类言语中的远距离统计相关性可能独立于语言结构而发生。为了检验这一假设,我们测量了来自儿童(6 个月至 12 岁)的几个语音语料库中的远距离相关性。我们发现,成人般的幂律统计相关性在最早可检测到的年龄就存在于人类发声中,早于复杂语言结构的产生。因此,这些语言结构不可能是语言中远距离统计相关性的唯一原因。