Suppr超能文献

多少才算足够?——词汇统计学的统计原则

How Many Is Enough?-Statistical Principles for Lexicostatistics.

作者信息

Zhang Menghan, Gong Tao

机构信息

Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University Shanghai, China.

Haskins LaboratoriesNew Haven, CT, USA; Center for Linguistics and Applied Linguistics, Guangdong University of Foreign StudiesGuangdong, China.

出版信息

Front Psychol. 2016 Dec 12;7:1916. doi: 10.3389/fpsyg.2016.01916. eCollection 2016.

Abstract

Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics.

摘要

词汇统计学已应用于语言学,以揭示语言之间的系统发生关系。在这种方法中有两个重要但尚未得到充分研究的参数:用于收集潜在真正同源词的词汇表的传统规模,以及确认反复出现的语音对应所需的最小匹配实例数。在此,我们从随机定理推导出两个统计原则来量化这些参数。这些原则验证了使用斯瓦迪士100词表和200词表来表明语言之间亲缘程度的做法,并启用了一个基于频率的动态阈值来检测反复出现的语音对应。通过统计测试,我们进一步评估了斯瓦迪士100词表相对于斯瓦迪士200词表以及从斯瓦迪士200词表中随机抽取的其他100词表的通用性。所有这些都为在历史语言学和比较语言学中应用词汇统计学提供了数学支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/832d/5149542/f22fe7e034f4/fpsyg-07-01916-g0001.jpg

相似文献

1
How Many Is Enough?-Statistical Principles for Lexicostatistics.
Front Psychol. 2016 Dec 12;7:1916. doi: 10.3389/fpsyg.2016.01916. eCollection 2016.
2
Using hybridization networks to retrace the evolution of Indo-European languages.
BMC Evol Biol. 2016 Sep 6;16(1):180. doi: 10.1186/s12862-016-0745-6.
3
Coevolution of languages and genes on the island of Sumba, eastern Indonesia.
Proc Natl Acad Sci U S A. 2007 Oct 9;104(41):16022-6. doi: 10.1073/pnas.0704451104. Epub 2007 Oct 3.
4
Toward the development of a cross-linguistic naming test.
Arch Clin Neuropsychol. 2007 Mar;22(3):297-307. doi: 10.1016/j.acn.2007.01.016. Epub 2007 Feb 15.
5
Cross-linguistic conditions on word length.
PLoS One. 2023 Jan 27;18(1):e0281041. doi: 10.1371/journal.pone.0281041. eCollection 2023.
6
The deep history of the number words.
Philos Trans R Soc Lond B Biol Sci. 2017 Feb 19;373(1740). doi: 10.1098/rstb.2016.0517.
7
Semantic Factors Predict the Rate of Lexical Replacement of Content Words.
PLoS One. 2016 Jan 28;11(1):e0147924. doi: 10.1371/journal.pone.0147924. eCollection 2016.
8
Variability of word discrimination scores in clinical practice and consequences on their sensitivity to hearing loss.
Eur Arch Otorhinolaryngol. 2017 May;274(5):2117-2124. doi: 10.1007/s00405-016-4439-x. Epub 2016 Dec 30.
9
The role of semantic activation during word recognition in Arabic.
Cogn Process. 2019 Aug;20(3):333-337. doi: 10.1007/s10339-019-00915-0. Epub 2019 Mar 20.
10
Arbitrariness, Iconicity, and Systematicity in Language.
Trends Cogn Sci. 2015 Oct;19(10):603-615. doi: 10.1016/j.tics.2015.07.013.

本文引用的文献

1
Arbitrariness, Iconicity, and Systematicity in Language.
Trends Cogn Sci. 2015 Oct;19(10):603-615. doi: 10.1016/j.tics.2015.07.013.
2
Detecting regular sound changes in linguistics as events of concerted evolution.
Curr Biol. 2015 Jan 5;25(1):1-9. doi: 10.1016/j.cub.2014.10.064. Epub 2014 Dec 18.
3
Automated reconstruction of ancient languages using probabilistic models of sound change.
Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4224-9. doi: 10.1073/pnas.1204678110. Epub 2013 Feb 11.
4
The Austronesian Basic Vocabulary Database: from bioinformatics to lexomics.
Evol Bioinform Online. 2008 Nov 3;4:271-83. doi: 10.4137/ebo.s893.
5
The origin of speech.
Sci Am. 1960 Sep;203:89-96.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验