• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自然选择的词汇:寻找适应性特征。

The natural selection of words: Finding the features of fitness.

机构信息

Ronin Institute, Montclair, New Jersey, United States of America.

National Research Council Canada, Ottawa, Ontario, Canada.

出版信息

PLoS One. 2019 Jan 28;14(1):e0211512. doi: 10.1371/journal.pone.0211512. eCollection 2019.

DOI:10.1371/journal.pone.0211512
PMID:30689665
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6349325/
Abstract

We introduce a dataset for studying the evolution of words, constructed from WordNet and the Google Books Ngram Corpus. The dataset tracks the evolution of 4,000 synonym sets (synsets), containing 9,000 English words, from 1800 AD to 2000 AD. We present a supervised learning algorithm that is able to predict the future leader of a synset: the word in the synset that will have the highest frequency. The algorithm uses features based on a word's length, the characters in the word, and the historical frequencies of the word. It can predict change of leadership (including the identity of the new leader) fifty years in the future, with an F-score considerably above random guessing. Analysis of the learned models provides insight into the causes of change in the leader of a synset. The algorithm confirms observations linguists have made, such as the trend to replace the -ise suffix with -ize, the rivalry between the -ity and -ness suffixes, and the struggle between economy (shorter words are easier to remember and to write) and clarity (longer words are more distinctive and less likely to be confused with one another). The results indicate that integration of the Google Books Ngram Corpus with WordNet has significant potential for improving our understanding of how language evolves.

摘要

我们介绍了一个用于研究词汇演变的数据集,该数据集由 WordNet 和 Google Books Ngram Corpus 构建而成。该数据集追踪了从 1800 年到 2000 年期间,4000 个同义词集(synsets)中包含的 9000 个英语单词的演变。我们提出了一种监督学习算法,该算法能够预测一个 synset 的未来领导者:即该 synset 中频率最高的单词。该算法使用基于单词长度、单词中的字符以及单词的历史频率的特征。它可以预测 50 年后的领导权变化(包括新领导者的身份),其 F 分数明显高于随机猜测。对学习模型的分析提供了对同义词集领导者变化原因的深入了解。该算法证实了语言学家的观察结果,例如用 -ize 替换 -ise 后缀的趋势、-ity 和 -ness 后缀之间的竞争以及经济(较短的单词更容易记忆和书写)与清晰度(较长的单词更具特色,不太可能相互混淆)之间的斗争。结果表明,将 Google Books Ngram Corpus 与 WordNet 集成具有显著提高我们对语言演变方式的理解的潜力。

相似文献

1
The natural selection of words: Finding the features of fitness.自然选择的词汇:寻找适应性特征。
PLoS One. 2019 Jan 28;14(1):e0211512. doi: 10.1371/journal.pone.0211512. eCollection 2019.
2
Stress Judgment and Production in English Derivation, and Word Reading in Adult Mandarin-Speaking English Learners.成年华语英语学习者的英语派生词中的重音判断与生成及单词阅读
J Psycholinguist Res. 2017 Aug;46(4):997-1017. doi: 10.1007/s10936-017-9475-1.
3
Use of positive and negative words in scientific PubMed abstracts between 1974 and 2014: retrospective analysis.1974年至2014年间科学类PubMed摘要中正负性词汇的使用:回顾性分析
BMJ. 2015 Dec 14;351:h6467. doi: 10.1136/bmj.h6467.
4
Dominant words rise to the top by positive frequency-dependent selection.优势词通过正频率依存选择而上升到顶部。
Proc Natl Acad Sci U S A. 2019 Apr 9;116(15):7397-7402. doi: 10.1073/pnas.1816994116. Epub 2019 Mar 21.
5
Recency effects for meaning and form in word selection.词汇选择中意义和形式的近因效应。
Brain Lang. 2002 Mar;80(3):465-87. doi: 10.1006/brln.2001.2609.
6
Culturomics as a data playground for tests of selection: Mathematical approaches to detecting selection in word use.文化组学作为选择测试的数据平台:检测词汇使用中选择的数学方法。
J Theor Biol. 2016 Sep 21;405:140-9. doi: 10.1016/j.jtbi.2015.12.012. Epub 2016 Jan 21.
7
Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms.提高 Google Ngram 研究可靠性的指南:宗教术语的证据。
PLoS One. 2019 Mar 22;14(3):e0213554. doi: 10.1371/journal.pone.0213554. eCollection 2019.
8
Mastering inflectional suffixes: a longitudinal study of beginning writers' spellings*.掌握屈折后缀:对初写者拼写的纵向研究*
J Child Lang. 2011 Jun;38(3):533-53. doi: 10.1017/S030500091000022X. Epub 2010 Aug 26.
9
Exploration of lexical-semantic factors affecting stress production in derived words.影响派生词重音产生的词汇语义因素探究。
Lang Speech Hear Serv Sch. 2007 Oct;38(4):378-89. doi: 10.1044/0161-1461(2007/039).
10
School-aged children's phonological production of derived English words.学龄儿童对派生英语单词的语音产出
J Speech Lang Hear Res. 2006 Apr;49(2):294-308. doi: 10.1044/1092-4388(2006/024).

引用本文的文献

1
EmoAtlas: An emotional network analyzer of texts that merges psychological lexicons, artificial intelligence, and network science.情绪图谱:一种融合心理词典、人工智能和网络科学的文本情绪网络分析工具。
Behav Res Methods. 2025 Jan 27;57(2):77. doi: 10.3758/s13428-024-02553-7.

本文引用的文献

1
The dynamics of norm change in the cultural evolution of language.语言文化进化中规范变化的动力学。
Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):8260-8265. doi: 10.1073/pnas.1721059115. Epub 2018 Aug 2.
2
Detecting evolutionary forces in language change.检测语言变化中的进化力量。
Nature. 2017 Nov 9;551(7679):223-226. doi: 10.1038/nature24455. Epub 2017 Nov 1.
3
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution.描述谷歌图书语料库:社会文化与语言演变推断的严格限制
PLoS One. 2015 Oct 7;10(10):e0137041. doi: 10.1371/journal.pone.0137041. eCollection 2015.
4
Extracting information from S-curves of language change.从语言变化的S曲线中提取信息。
J R Soc Interface. 2014 Dec 6;11(101):20141044. doi: 10.1098/rsif.2014.1044.
5
Internal and external dynamics in language: evidence from verb regularity in a historical corpus of English.语言中的内部与外部动态:来自英语历史语料库中动词规则性的证据
PLoS One. 2014 Aug 1;9(8):e102882. doi: 10.1371/journal.pone.0102882. eCollection 2014.
6
Languages cool as they expand: allometric scaling and the decreasing need for new words.语言随着扩展而变得更加酷:异速生长和对新词的需求减少。
Sci Rep. 2012;2:943. doi: 10.1038/srep00943. Epub 2012 Dec 10.
7
Evolution of the most common English words and phrases over the centuries.几个世纪以来最常用的英语单词和短语的演变。
J R Soc Interface. 2012 Dec 7;9(77):3323-8. doi: 10.1098/rsif.2012.0491. Epub 2012 Jul 25.
8
Quantitative analysis of culture using millions of digitized books.利用数百万本数字化书籍进行文化的定量分析。
Science. 2011 Jan 14;331(6014):176-82. doi: 10.1126/science.1199644. Epub 2010 Dec 16.
9
Human language as a culturally transmitted replicator.人类语言作为一种文化传递的复制因子。
Nat Rev Genet. 2009 Jun;10(6):405-15. doi: 10.1038/nrg2560.
10
A comprehensive phylogeny of beetles reveals the evolutionary origins of a superradiation.一项全面的甲虫系统发育研究揭示了一次超级辐射的进化起源。
Science. 2007 Dec 21;318(5858):1913-6. doi: 10.1126/science.1146954.