Suppr超能文献

基于词共现网络复杂网络分析的简体中文与繁体中文差异研究

A Study on Differences between Simplified and Traditional Chinese Based on Complex Network Analysis of the Word Co-Occurrence Networks.

作者信息

Jiang Zhongqiang, Zhao Dongmei, Zheng Jiangbin, Chen Yidong

机构信息

China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou, China.

Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen 361005, China.

出版信息

Comput Intell Neurosci. 2020 Dec 3;2020:8863847. doi: 10.1155/2020/8863847. eCollection 2020.

Abstract

Currently, most work on comparing differences between simplified and traditional Chinese only focuses on the character or lexical level, without taking the global differences into consideration. In order to solve this problem, this paper proposes to use complex network analysis of word co-occurrence networks, which have been successfully applied to the language analysis research and can tackle global characters and explore the differences between simplified and traditional Chinese. Specially, we first constructed a word co-occurrence network for simplified and traditional Chinese using selected news corpora. Then, the complex network analysis methods were performed, including network statistics analysis, kernel lexicon comparison, and motif analysis, to gain a global understanding of these networks. After that, the networks were compared based on the properties obtained. Through comparison, we can obtain three interesting results: first, the co-occurrence networks of simplified Chinese and traditional Chinese are both small-world and scale-free networks. However, given the same corpus size, the co-occurrence networks of traditional Chinese tend to have more nodes, which may be due to a large number of one-to-many character/word mappings from simplified Chinese to traditional Chinese; second, since traditional Chinese retains more ancient Chinese words and uses fewer weak verbs, the traditional Chinese kernel lexicons have more entries than the simplified Chinese kernel lexicons; third, motif analysis shows that there is no difference between the simplified Chinese network and the corresponding traditional Chinese network, which means that simplified and traditional Chinese are semantically consistent.

摘要

目前,大多数关于简体中文和繁体中文差异比较的研究仅聚焦于字符或词汇层面,而未考虑整体差异。为解决这一问题,本文提出使用词共现网络的复杂网络分析方法,该方法已成功应用于语言分析研究,能够处理整体特征并探究简体中文和繁体中文之间的差异。具体而言,我们首先使用选定的新闻语料库构建了简体中文和繁体中文的词共现网络。然后,执行复杂网络分析方法,包括网络统计分析、核心词汇比较和基序分析,以全面了解这些网络。之后,根据所获得的属性对网络进行比较。通过比较,我们可以得到三个有趣的结果:第一,简体中文和繁体中文的共现网络均为小世界网络和无标度网络。然而,在语料库规模相同的情况下,繁体中文的共现网络往往有更多节点,这可能是由于从简体中文到繁体中文存在大量一对多的字符/词汇映射;第二,由于繁体中文保留了更多古汉语词汇且使用较少的弱动词,繁体中文的核心词汇比简体中文的核心词汇条目更多;第三,基序分析表明简体中文网络与相应的繁体中文网络之间没有差异,这意味着简体中文和繁体中文在语义上是一致的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/545b/7728479/0588cf129aff/CIN2020-8863847.001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验