• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用人口较多的语言往往更难(通过机器)学习。

Languages with more speakers tend to be harder to (machine-)learn.

机构信息

Leibniz Institute for the German Language (IDS), Mannheim, Germany.

出版信息

Sci Rep. 2023 Oct 28;13(1):18521. doi: 10.1038/s41598-023-45373-z.

DOI:10.1038/s41598-023-45373-z
PMID:37898699
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10613286/
Abstract

Computational language models (LMs), most notably exemplified by the widespread success of OpenAI's ChatGPT chatbot, show impressive performance on a wide range of linguistic tasks, thus providing cognitive science and linguistics with a computational working model to empirically study different aspects of human language. Here, we use LMs to test the hypothesis that languages with more speakers tend to be easier to learn. In two experiments, we train several LMs-ranging from very simple n-gram models to state-of-the-art deep neural networks-on written cross-linguistic corpus data covering 1293 different languages and statistically estimate learning difficulty. Using a variety of quantitative methods and machine learning techniques to account for phylogenetic relatedness and geographical proximity of languages, we show that there is robust evidence for a relationship between learning difficulty and speaker population size. However, contrary to expectations derived from previous research, our results suggest that languages with more speakers tend to be harder to learn.

摘要

计算语言模型(LMs),特别是 OpenAI 的广受欢迎的 ChatGPT 聊天机器人的成功范例,在广泛的语言任务上表现出令人印象深刻的性能,从而为认知科学和语言学提供了一个计算工作模型,以经验性地研究人类语言的不同方面。在这里,我们使用 LMs 来检验这样一个假设,即使用者较多的语言往往更容易学习。在两个实验中,我们使用涵盖 1293 种不同语言的书面跨语言语料库数据来训练几个 LMs,从非常简单的 n-gram 模型到最先进的深度神经网络,并对学习难度进行统计估计。我们使用各种定量方法和机器学习技术来解释语言的系统发育亲缘关系和地理位置的接近程度,结果表明,学习难度与说话人数量之间存在着稳健的关系。然而,与先前研究得出的预期相反,我们的结果表明,使用者较多的语言往往更难学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/fbdd97bb3877/41598_2023_45373_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/9f5c0a7ea919/41598_2023_45373_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/a3d84296b88c/41598_2023_45373_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/b8c77e0b56ef/41598_2023_45373_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/d7d8c5223044/41598_2023_45373_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/50aee041b7c0/41598_2023_45373_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/45b126163fe7/41598_2023_45373_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/fbdd97bb3877/41598_2023_45373_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/9f5c0a7ea919/41598_2023_45373_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/a3d84296b88c/41598_2023_45373_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/b8c77e0b56ef/41598_2023_45373_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/d7d8c5223044/41598_2023_45373_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/50aee041b7c0/41598_2023_45373_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/45b126163fe7/41598_2023_45373_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8db3/10613286/fbdd97bb3877/41598_2023_45373_Fig7_HTML.jpg

相似文献

1
Languages with more speakers tend to be harder to (machine-)learn.使用人口较多的语言往往更难(通过机器)学习。
Sci Rep. 2023 Oct 28;13(1):18521. doi: 10.1038/s41598-023-45373-z.
2
What makes a language easy to learn? A preregistered study on how systematic structure and community size affect language learnability.什么样的语言更容易学?一项关于系统性结构和社群规模如何影响语言可学度的预先注册研究。
Cognition. 2021 May;210:104620. doi: 10.1016/j.cognition.2021.104620. Epub 2021 Feb 8.
3
Societies of strangers do not speak less complex languages.陌生人社会的语言并不简单。
Sci Adv. 2023 Aug 18;9(33):eadf7704. doi: 10.1126/sciadv.adf7704. Epub 2023 Aug 16.
4
Using hybridization networks to retrace the evolution of Indo-European languages.利用杂交网络追溯印欧语系语言的演变。
BMC Evol Biol. 2016 Sep 6;16(1):180. doi: 10.1186/s12862-016-0745-6.
5
A Universal Cognitive Bias in Word Order: Evidence From Speakers Whose Language Goes Against It.语序的普遍认知偏差:来自语言违背这一规则的说话者的证据。
Psychol Sci. 2024 Mar;35(3):304-311. doi: 10.1177/09567976231222836. Epub 2024 Feb 22.
6
Word Order Typology Interacts With Linguistic Complexity: A Cross-Linguistic Corpus Study.语序类型学与语言复杂性相互作用:一项跨语言语料库研究。
Cogn Sci. 2020 Apr;44(4):e12822. doi: 10.1111/cogs.12822.
7
Conceptual relations predict colexification across languages.概念关系预测语言间的共词化。
Cognition. 2020 Aug;201:104280. doi: 10.1016/j.cognition.2020.104280. Epub 2020 May 19.
8
From One Bilingual to the Next: An Iterated Learning Study on Language Evolution in Bilingual Societies.从双语到下一个:双语社会中语言进化的迭代学习研究。
Cogn Sci. 2023 May;47(5):e13289. doi: 10.1111/cogs.13289.
9
Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche.不同语言,相似的编码效率:人类交际范围内相当的信息率。
Sci Adv. 2019 Sep 4;5(9):eaaw2594. doi: 10.1126/sciadv.aaw2594. eCollection 2019 Sep.
10
Detecting Alzheimer's Disease from Continuous Speech Using Language Models.利用语言模型从连续语音中检测阿尔茨海默病
J Alzheimers Dis. 2019;70(4):1163-1174. doi: 10.3233/JAD-190452.

引用本文的文献

1
The Relationship Between Community Size and Iconicity in Sign Languages.手语中社群规模与象似性的关系
Cogn Sci. 2025 Jun;49(6):e70074. doi: 10.1111/cogs.70074.

本文引用的文献

1
A large quantitative analysis of written language challenges the idea that all languages are equally complex.一项针对书面语言的大规模定量分析对所有语言都同样复杂这一观点提出了挑战。
Sci Rep. 2023 Sep 16;13(1):15351. doi: 10.1038/s41598-023-42327-3.
2
Societies of strangers do not speak less complex languages.陌生人社会的语言并不简单。
Sci Adv. 2023 Aug 18;9(33):eadf7704. doi: 10.1126/sciadv.adf7704. Epub 2023 Aug 16.
3
Emergent analogical reasoning in large language models.大语言模型中的紧急类比推理。
Nat Hum Behav. 2023 Sep;7(9):1526-1541. doi: 10.1038/s41562-023-01659-w. Epub 2023 Jul 31.
4
Grambank reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language loss.格兰班克揭示了谱系约束对语言多样性的重要性,并强调了语言丧失的影响。
Sci Adv. 2023 Apr 21;9(16):eadg6175. doi: 10.1126/sciadv.adg6175. Epub 2023 Apr 19.
5
The debate over understanding in AI's large language models.人工智能大型语言模型中的理解之争。
Proc Natl Acad Sci U S A. 2023 Mar 28;120(13):e2215907120. doi: 10.1073/pnas.2215907120. Epub 2023 Mar 21.
6
Large Language Models Demonstrate the Potential of Statistical Learning in Language.大型语言模型展示了统计学习在语言中的潜力。
Cogn Sci. 2023 Mar;47(3):e13256. doi: 10.1111/cogs.13256.
7
Imperfect language learning reduces morphological overspecification: Experimental evidence.不完善的语言学习减少形态过度指定:实验证据。
PLoS One. 2022 Jan 27;17(1):e0262876. doi: 10.1371/journal.pone.0262876. eCollection 2022.
8
One model for the learning of language.一种语言学习模型。
Proc Natl Acad Sci U S A. 2022 Feb 1;119(5). doi: 10.1073/pnas.2021865119.
9
Global predictors of language endangerment and the future of linguistic diversity.语言濒危的全球预测因素与语言多样性的未来
Nat Ecol Evol. 2022 Feb;6(2):163-173. doi: 10.1038/s41559-021-01604-y. Epub 2021 Dec 16.
10
What makes a language easy to learn? A preregistered study on how systematic structure and community size affect language learnability.什么样的语言更容易学?一项关于系统性结构和社群规模如何影响语言可学度的预先注册研究。
Cognition. 2021 May;210:104620. doi: 10.1016/j.cognition.2021.104620. Epub 2021 Feb 8.