Department of Linguistics, University of Pennsylvania, 3401-C Walnut Street 300C, Philadelphia, PA 19104, United States of America.
Department of Linguistics, University of Pennsylvania, 3401-C Walnut Street 300C, Philadelphia, PA 19104, United States of America; Department of Linguistics and Institute for Advanced Computational Science, Stony Brook University, Stony Brook, NY 11794, United States of America.
Cognition. 2020 Dec;205:104466. doi: 10.1016/j.cognition.2020.104466. Epub 2020 Oct 1.
Is language designed for communicative and functional efficiency? G. K. Zipf famously argued that shorter words are more frequent because they are easier to use, thereby resulting in the statistical law that bears his name. Yet, G. A. Miller showed that even a monkey randomly typing at a keyboard, and intermittently striking the space bar, would generate "words" with similar statistical properties. Recent quantitative analyses of human language lexicons (Piantadosi et al., 2012) have revived Zipf's functionalist hypothesis. Ambiguous words tend to be short, frequent, and easy to articulate in language production. Such statistical findings are commonly interpreted as evidence for pressure for efficiency, as the context of language use often provides cues to overcome lexical ambiguity. In this study, we update Miller's monkey thought experiment to incorporate empirically motivated phonological and semantic constraints on the creation of words. We claim that the appearance of communicative efficiency is a spandrel (Gould & Lewontin, 1979), as lexicons formed without the context of language use or reference to communication or efficiency exhibit comparable statistical properties. Furthermore, the updated monkey model provides a good fit for the growth trajectory of English as recorded in the Oxford English Dictionary. Focusing on the history of English words since 1900, we show that lexicons resulting from the monkey model provide a better embodiment of communicative efficiency than the actual lexicon of English. We conclude by arguing for the need to go beyond correlational statistics and to seek direct evidence for the mechanisms that underlie principles of language design.
语言是为了交流和功能效率而设计的吗?G.K.齐普夫(G. K. Zipf)曾有过一个著名的论断,即较短的单词出现的频率更高,因为它们更容易使用,从而导致了以他的名字命名的统计规律。然而,G.A.米勒(G. A. Miller)表明,即使是一只随机在键盘上打字、间歇性敲击空格键的猴子,也会生成具有类似统计属性的“单词”。最近对人类语言词汇的定量分析(Piantadosi 等人,2012)重新激发了齐普夫的功能主义假设。歧义词汇往往较短、出现频率较高,并且在语言生成中易于表达。这些统计发现通常被解释为效率压力的证据,因为语言使用的上下文经常提供了克服词汇歧义的线索。在这项研究中,我们更新了米勒的猴子思维实验,纳入了对单词创建具有实证依据的语音和语义限制。我们声称,交流效率的出现是一种副现象(Gould & Lewontin,1979),因为在没有语言使用的上下文、没有指向交流或效率的情况下形成的词汇,也具有类似的统计属性。此外,更新后的猴子模型很好地拟合了《牛津英语词典》中记录的英语的增长轨迹。我们专注于 1900 年以来英语单词的历史,表明猴子模型生成的词汇比实际的英语词汇更能体现交流效率。最后,我们认为需要超越相关统计,寻求直接证据来证明语言设计原则背后的机制。