Suppr超能文献

英语单词和汉字的常见构造模式。

A common construction pattern of English words and Chinese characters.

机构信息

Department of Physics and State Key Laboratory of Surface Physics, Fudan University, Shanghai, China.

出版信息

PLoS One. 2013 Sep 2;8(9):e74515. doi: 10.1371/journal.pone.0074515. eCollection 2013.

Abstract

Rankings are ubiquitous around the world. Here I investigate spatial ranking patterns of English Words and Chinese Characters, and reveal a common construction pattern related to phase separation. In detail, I analyze a list of different words in the English language, and find that the frequency of the number of letters per word linearly or nonlinearly decays over its rank in the frequency table. I interpret the linearly decaying area as a linear phase that covers 96.4% words, which is in sharp contrast to a nonlinear phase (representing the nonlinearly decaying area) that covers the remaining 3.6% words. Amazingly, the phase separation phenomenon with the same two percentages of 96.4% and 3.6% holds also for the relation between strokes and characters in the Chinese language although English and Chinese are two distinctly different language systems. The common construction pattern originates from the log-normal distributions of frequencies of words or characters, which can be understood by the joint effect of both the Weber-Fechner law in psychophysics and the principle of maximum entropy in information theory.

摘要

排名在世界各地无处不在。在这里,我研究了英语单词和汉字的空间排名模式,并揭示了与相分离相关的一种常见构造模式。具体来说,我分析了英语语言中的不同单词列表,发现单词的每个字母的数量频率与其在频率表中的排名呈线性或非线性衰减。我将线性衰减区域解释为覆盖 96.4%单词的线性相,这与覆盖其余 3.6%单词的非线性相(表示非线性衰减区域)形成鲜明对比。令人惊讶的是,尽管英语和汉语是两种截然不同的语言系统,但在汉语中,笔画与字符之间也存在相同的两个百分比(96.4%和 3.6%)的相分离现象。这种共同的构造模式源自于单词或字符频率的对数正态分布,可以通过心理物理学中的韦伯-费希纳定律和信息论中的最大熵原理的共同作用来理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b75/3759465/f718d1d1ec30/pone.0074515.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验