多少才算足够？——词汇统计学的统计原则

How Many Is Enough?-Statistical Principles for Lexicostatistics.

作者信息

Zhang Menghan, Gong Tao

机构信息

Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University Shanghai, China.

Haskins LaboratoriesNew Haven, CT, USA; Center for Linguistics and Applied Linguistics, Guangdong University of Foreign StudiesGuangdong, China.

出版信息

Front Psychol. 2016 Dec 12;7:1916. doi: 10.3389/fpsyg.2016.01916. eCollection 2016.

DOI:10.3389/fpsyg.2016.01916

PMID:28018261

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5149542/

Abstract

Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics.

摘要

词汇统计学已应用于语言学，以揭示语言之间的系统发生关系。在这种方法中有两个重要但尚未得到充分研究的参数：用于收集潜在真正同源词的词汇表的传统规模，以及确认反复出现的语音对应所需的最小匹配实例数。在此，我们从随机定理推导出两个统计原则来量化这些参数。这些原则验证了使用斯瓦迪士100词表和200词表来表明语言之间亲缘程度的做法，并启用了一个基于频率的动态阈值来检测反复出现的语音对应。通过统计测试，我们进一步评估了斯瓦迪士100词表相对于斯瓦迪士200词表以及从斯瓦迪士200词表中随机抽取的其他100词表的通用性。所有这些都为在历史语言学和比较语言学中应用词汇统计学提供了数学支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/832d/5149542/f22fe7e034f4/fpsyg-07-01916-g0001.jpg

相似文献

How Many Is Enough?-Statistical Principles for Lexicostatistics.多少才算足够？——词汇统计学的统计原则

Front Psychol. 2016 Dec 12;7:1916. doi: 10.3389/fpsyg.2016.01916. eCollection 2016.

Using hybridization networks to retrace the evolution of Indo-European languages.利用杂交网络追溯印欧语系语言的演变。

BMC Evol Biol. 2016 Sep 6;16(1):180. doi: 10.1186/s12862-016-0745-6.

Coevolution of languages and genes on the island of Sumba, eastern Indonesia.印度尼西亚东部松巴岛语言与基因的共同演化

Proc Natl Acad Sci U S A. 2007 Oct 9;104(41):16022-6. doi: 10.1073/pnas.0704451104. Epub 2007 Oct 3.

Toward the development of a cross-linguistic naming test.迈向跨语言命名测试的开发。

Arch Clin Neuropsychol. 2007 Mar;22(3):297-307. doi: 10.1016/j.acn.2007.01.016. Epub 2007 Feb 15.

Cross-linguistic conditions on word length.跨语言条件下的单词长度。

PLoS One. 2023 Jan 27;18(1):e0281041. doi: 10.1371/journal.pone.0281041. eCollection 2023.

The deep history of the number words.数字词汇的深远历史。

Philos Trans R Soc Lond B Biol Sci. 2017 Feb 19;373(1740). doi: 10.1098/rstb.2016.0517.

Semantic Factors Predict the Rate of Lexical Replacement of Content Words.语义因素预测实词的词汇替换率。

PLoS One. 2016 Jan 28;11(1):e0147924. doi: 10.1371/journal.pone.0147924. eCollection 2016.

Variability of word discrimination scores in clinical practice and consequences on their sensitivity to hearing loss.临床实践中单词辨别分数的变异性及其对听力损失敏感性的影响。

Eur Arch Otorhinolaryngol. 2017 May;274(5):2117-2124. doi: 10.1007/s00405-016-4439-x. Epub 2016 Dec 30.

The role of semantic activation during word recognition in Arabic.语义激活在阿拉伯语单词识别过程中的作用。

Cogn Process. 2019 Aug;20(3):333-337. doi: 10.1007/s10339-019-00915-0. Epub 2019 Mar 20.

Arbitrariness, Iconicity, and Systematicity in Language.语言的任意性、象似性和系统性。

Trends Cogn Sci. 2015 Oct;19(10):603-615. doi: 10.1016/j.tics.2015.07.013.

本文引用的文献

Arbitrariness, Iconicity, and Systematicity in Language.语言的任意性、象似性和系统性。

Trends Cogn Sci. 2015 Oct;19(10):603-615. doi: 10.1016/j.tics.2015.07.013.

Detecting regular sound changes in linguistics as events of concerted evolution.将语言学中规则的语音变化检测为协同演变事件。

Curr Biol. 2015 Jan 5;25(1):1-9. doi: 10.1016/j.cub.2014.10.064. Epub 2014 Dec 18.

Automated reconstruction of ancient languages using probabilistic models of sound change.使用语音变化的概率模型自动重建古代语言。

Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4224-9. doi: 10.1073/pnas.1204678110. Epub 2013 Feb 11.

The Austronesian Basic Vocabulary Database: from bioinformatics to lexomics.澳斯特罗尼西亚基本词汇数据库：从生物信息学到词汇组学。

Evol Bioinform Online. 2008 Nov 3;4:271-83. doi: 10.4137/ebo.s893.

The origin of speech.言语的起源。

Sci Am. 1960 Sep;203:89-96.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多少才算足够？——词汇统计学的统计原则

How Many Is Enough?-Statistical Principles for Lexicostatistics.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献