TALP Research Center, Computer Science Departament, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain.
LARCA Research Group, Complexity and Quantitative Linguistics Laboratory, Computer Science Departament, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain.
PLoS One. 2021 Dec 16;16(12):e0260849. doi: 10.1371/journal.pone.0260849. eCollection 2021.
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan. We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.
在他的开创性研究中,G.K.齐普夫提出了几个关于词频与词义数量之间关系的统计定律:词义分布定律,涉及词的频率与其频率等级的关系,以及词义频率定律,涉及词的频率与其词义数量的关系。尽管这些定律是半个多世纪前提出的,但它们只在少数几种语言中得到了研究。在这里,我们首次在加泰罗尼亚语中对这些定律进行了研究。我们通过它们的指数与等级频率定律之间的关系,在加泰罗尼亚语中验证了这些定律。我们提出了一种新的分析这些齐普夫定律的方法,可以扩展到其他语言。我们报告了在书面语和口语中这些定律的两个显著规律的第一个证据,与 21 世纪初在大型多作者语料库中发现的齐普夫等级频率定律中的两个规律相平行。最后,将讨论这两个规律的意义。