Melnik S S, Usatenko O V
A. Ya. Usikov Institute for Radiophysics and Electronics, Ukrainian Academy of Science, 12 Proskura Street, 61805 Kharkov, Ukraine.
Comput Biol Chem. 2014 Dec;53 Pt A:26-31. doi: 10.1016/j.compbiolchem.2014.08.006. Epub 2014 Aug 27.
We analyze the structure of DNA molecules of different organisms by using the additive Markov chain approach. Transforming nucleotide sequences into binary strings, we perform statistical analysis of the corresponding "texts". We develop the theory of N-step additive binary stationary ergodic Markov chains and analyze their differential entropy. Supposing that the correlations are weak we express the conditional probability function of the chain by means of the pair correlation function and represent the entropy as a functional of the pair correlator. Since the model uses two point correlators instead of probability of block occurring, it makes possible to calculate the entropy of subsequences at much longer distances than with the use of the standard methods. We utilize the obtained analytical result for numerical evaluation of the entropy of coarse-grained DNA texts. We believe that the entropy study can be used for biological classification of living species.
我们采用加性马尔可夫链方法分析不同生物体DNA分子的结构。将核苷酸序列转换为二进制字符串后,我们对相应的“文本”进行统计分析。我们发展了N步加性二进制平稳遍历马尔可夫链理论并分析其微分熵。假设相关性较弱,我们通过对相关函数来表示链的条件概率函数,并将熵表示为对相关器的泛函。由于该模型使用两点相关器而非块出现的概率,因此与使用标准方法相比,能够计算距离长得多的子序列的熵。我们利用所得的分析结果对粗粒度DNA文本的熵进行数值评估。我们认为熵研究可用于生物物种的生物学分类。