Yang Tianguang, Gu Changgui, Yang Huijie
Department of Statistics, School of Mathematical Sciences, Nankai University, Tianjin 300071, P. R. China.
Department of Systems Science, Business School, University of Shanghai for Science and Technology, Shanghai 200093, P. R. China.
PLoS One. 2016 Sep 20;11(9):e0162423. doi: 10.1371/journal.pone.0162423. eCollection 2016.
A sentence is the natural unit of language. Patterns embedded in series of sentences can be used to model the formation and evolution of languages, and to solve practical problems such as evaluating linguistic ability. In this paper, we apply de-trended fluctuation analysis to detect long-range correlations embedded in sentence series from A Story of the Stone, one of the greatest masterpieces of Chinese literature. We identified a weak long-range correlation, with a Hurst exponent of 0.575±0.002 up to a scale of 104. We used the structural stability to confirm the behavior of the long-range correlation, and found that different parts of the series had almost identical Hurst exponents. We found that noisy records can lead to false results and conclusions, even if the noise covers a limited proportion of the total records (e.g., less than 1%). Thus, the structural stability test is an essential procedure for confirming the existence of long-range correlations, which has been widely neglected in previous studies. Furthermore, a combination of de-trended fluctuation analysis and diffusion entropy analysis demonstrated that the sentence series was generated by a fractional Brownian motion.
句子是语言的自然单位。嵌入句子序列中的模式可用于模拟语言的形成和演变,并解决诸如评估语言能力等实际问题。在本文中,我们应用去趋势波动分析来检测中国文学最伟大的杰作之一《红楼梦》句子序列中嵌入的长程相关性。我们识别出一种弱长程相关性,在高达104的尺度上,赫斯特指数为0.575±0.002。我们使用结构稳定性来确认长程相关性的行为,发现序列的不同部分具有几乎相同的赫斯特指数。我们发现,即使噪声占总记录的比例有限(例如,小于1%),有噪声的记录也会导致错误的结果和结论。因此,结构稳定性测试是确认长程相关性存在的必要程序,而这在以往的研究中被广泛忽视。此外,去趋势波动分析和扩散熵分析相结合表明,句子序列是由分数布朗运动生成的。