Zhang Menghan, Gong Tao
Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University Shanghai, China.
Haskins LaboratoriesNew Haven, CT, USA; Center for Linguistics and Applied Linguistics, Guangdong University of Foreign StudiesGuangdong, China.
Front Psychol. 2016 Dec 12;7:1916. doi: 10.3389/fpsyg.2016.01916. eCollection 2016.
Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics.
词汇统计学已应用于语言学,以揭示语言之间的系统发生关系。在这种方法中有两个重要但尚未得到充分研究的参数:用于收集潜在真正同源词的词汇表的传统规模,以及确认反复出现的语音对应所需的最小匹配实例数。在此,我们从随机定理推导出两个统计原则来量化这些参数。这些原则验证了使用斯瓦迪士100词表和200词表来表明语言之间亲缘程度的做法,并启用了一个基于频率的动态阈值来检测反复出现的语音对应。通过统计测试,我们进一步评估了斯瓦迪士100词表相对于斯瓦迪士200词表以及从斯瓦迪士200词表中随机抽取的其他100词表的通用性。所有这些都为在历史语言学和比较语言学中应用词汇统计学提供了数学支持。