Psychology Department, Hebrew University of Jerusalem, Jerusalem, Israel.
School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, UK.
Sci Rep. 2024 Mar 4;14(1):5255. doi: 10.1038/s41598-024-56152-9.
Human language is unique in its structure: language is made up of parts that can be recombined in a productive way. The parts are not given but have to be discovered by learners exposed to unsegmented wholes. Across languages, the frequency distribution of those parts follows a power law. Both statistical properties-having parts and having them follow a particular distribution-facilitate learning, yet their origin is still poorly understood. Where do the parts come from and why do they follow a particular frequency distribution? Here, we show how these two core properties emerge from the process of cultural evolution with whole-to-part learning. We use an experimental analog of cultural transmission in which participants copy sets of non-linguistic sequences produced by a previous participant: This design allows us to ask if parts will emerge purely under pressure for the system to be learnable, even without meanings to convey. We show that parts emerge from initially unsegmented sequences, that their distribution becomes closer to a power law over generations, and, importantly, that these properties make the sets of sequences more learnable. We argue that these two core statistical properties of language emerge culturally both as a cause and effect of greater learnability.
语言是由可以以富有成效的方式重新组合的部分组成的。这些部分不是给定的,而是必须由接触非分割整体的学习者发现。在不同的语言中,这些部分的频率分布遵循幂律。这两个统计特性——有部分和部分遵循特定分布——都有助于学习,但它们的起源仍知之甚少。这些部分来自哪里,为什么它们遵循特定的频率分布?在这里,我们展示了这两个核心特性如何从具有整体到部分学习的文化进化过程中产生。我们使用了一种文化传播的实验模拟,其中参与者复制由前一个参与者生成的非语言序列集:这种设计允许我们询问,即使没有要传达的含义,仅在系统可学习的压力下,部分是否会纯粹出现。我们表明,部分从最初的非分割序列中出现,它们的分布在几代人之间变得更接近幂律,重要的是,这些特性使序列集更具可学习性。我们认为,语言的这两个核心统计特性在文化上既是更大可学习性的原因,也是结果。