Department of Statistics, University of British Columbia, Vancouver, BC V6T 1Z4, Canada.
Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4224-9. doi: 10.1073/pnas.1204678110. Epub 2013 Feb 11.
One of the oldest problems in linguistics is reconstructing the words that appeared in the protolanguages from which modern languages evolved. Identifying the forms of these ancient languages makes it possible to evaluate proposals about the nature of language change and to draw inferences about human history. Protolanguages are typically reconstructed using a painstaking manual process known as the comparative method. We present a family of probabilistic models of sound change as well as algorithms for performing inference in these models. The resulting system automatically and accurately reconstructs protolanguages from modern languages. We apply this system to 637 Austronesian languages, providing an accurate, large-scale automatic reconstruction of a set of protolanguages. Over 85% of the system's reconstructions are within one character of the manual reconstruction provided by a linguist specializing in Austronesian languages. Being able to automatically reconstruct large numbers of languages provides a useful way to quantitatively explore hypotheses about the factors determining which sounds in a language are likely to change over time. We demonstrate this by showing that the reconstructed Austronesian protolanguages provide compelling support for a hypothesis about the relationship between the function of a sound and its probability of changing that was first proposed in 1955.
语言学中最古老的问题之一是重建从现代语言演化而来的原始语言中的单词。确定这些古代语言的形式,可以评估关于语言变化本质的提议,并推断人类历史。原始语言通常使用一种称为比较法的艰苦手动过程来重建。我们提出了一系列语音变化的概率模型以及在这些模型中进行推理的算法。由此产生的系统可以自动且准确地从现代语言重建原始语言。我们将此系统应用于 637 种南岛语族语言,提供了一组原始语言的准确、大规模的自动重建。系统的重建中有超过 85%的与专门研究南岛语族语言的语言学家提供的手动重建仅相差一个字符。能够自动重建大量语言为定量探索决定语言中哪些音在随时间变化的因素的假设提供了一种有用的方法。我们通过展示重建的南岛语族原始语言为 1955 年首次提出的关于声音功能与其变化概率之间关系的假设提供了有力支持来证明这一点。