Suppr超能文献

Rank diversity of languages: generic behavior in computational linguistics.

作者信息

Cocho Germinal, Flores Jorge, Gershenson Carlos, Pineda Carlos, Sánchez Sergio

机构信息

Instituto de Física, Universidad Nacional Autónoma de México, Mexico City, Mexico; Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City, Mexico.

Instituto de Física, Universidad Nacional Autónoma de México, Mexico City, Mexico.

出版信息

PLoS One. 2015 Apr 7;10(4):e0121898. doi: 10.1371/journal.pone.0121898. eCollection 2015.

Abstract

Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: "heads" consist of words which almost do not change their rank in time, "bodies" are words of general use, while "tails" are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cb5/4388647/e4ea3a27dd2c/pone.0121898.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验