Suppr超能文献

中国地方政府工作报告中的齐普夫定律:一项运用自然语言处理和回归分析的21年研究。

Zipf's law in China's local government work reports: A 21-year study using natural language processing and regression analysis.

作者信息

Li Yanfang

机构信息

School of International Languages, Xiamen University of Techonology, Xiamen, Fujian, China.

出版信息

PLoS One. 2025 May 20;20(5):e0324713. doi: 10.1371/journal.pone.0324713. eCollection 2025.

Abstract

The examination and application of Zipf's law is a significant topic in quantitative linguistics. This study presents an in-depth empirical investigation of this law in 651 Chinese provincial government work reports (2003-2023). Employing natural language processing techniques (including Jieba word segmentation with a custom dictionary) and a double-logarithmic regression model, we analyzed word frequency distributions. Our findings indicate that the Zipf coefficient in these reports is close to 1, confirming general adherence to Zipf's law. Over the 21-year period, the Zipf coefficient exhibits fluctuations, with a notable inflection point in 2011, after which it follows a consistent upward trend. This shift is likely influenced by the 18th National Congress of the Communist Party of China, which marked a transition toward more standardized and centralized policy communication. While regional differences among eastern, central, western, and northeastern provinces are minimal, centrally governed municipalities exhibit higher Zipf coefficients than other provincial-level regions. Although our findings largely confirm the applicability of Zipf's Law to this specific corpus, this study is limited by the exclusion of prefecture- and county-level reports. Future research can address this limitation by incorporating a broader range of administrative levels and by conducting cross-country and cultural comparisons of political documents. Further investigation of alternate quantitative linguistic laws (e.g., Heaps' Law, Menzerath's Law) within this corpus is also warranted.

摘要

齐普夫定律的检验与应用是定量语言学中的一个重要课题。本研究对651篇中国省级政府工作报告(2003 - 2023年)中的该定律进行了深入实证研究。我们运用自然语言处理技术(包括使用自定义词典的结巴分词)和双对数回归模型,分析了词频分布。我们的研究结果表明,这些报告中的齐普夫系数接近1,证实了对齐普夫定律的普遍遵循。在这21年期间,齐普夫系数呈现波动,2011年出现显著拐点,之后呈持续上升趋势。这一转变可能受到中国共产党第十八次全国代表大会的影响,该会议标志着政策沟通向更加规范和集中的方向转变。虽然东部、中部、西部和东北部省份之间的区域差异很小,但直辖市的齐普夫系数高于其他省级地区。尽管我们的研究结果在很大程度上证实了齐普夫定律对这一特定语料库的适用性,但本研究受到排除地级市和县级报告的限制。未来的研究可以通过纳入更广泛的行政级别以及对政治文件进行跨国和跨文化比较来解决这一限制。对该语料库中其他定量语言定律(如希普斯定律、门泽拉斯定律)的进一步研究也很有必要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6281/12091883/bb4a9ab4eadf/pone.0324713.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验