Borodovskiĭ M Iu, Sprizhitskiĭ Iu A, Golovanov E I, Aleksandrov A A
Mol Biol (Mosk). 1986 Jul-Aug;20(4):1024-33.
We introduced non-stationary Marcov chains for statistical description of the DNA E. coli structural domains. The values of all needed parameters for those chains was determined by the preliminary statistical processing of a wide set of the E. coli coding regions. It was shown that non-stationary models predict frequencies of occurrences of various combinations of nucleotides within the coding fragments of DNA, better than stationary ones. In particular non-stationary models give good approximation for short and long distance arrangement of nucleotides in the coding regions. The correlation parameters for neighbour codons and for neighbour amino acid residuals in E. coli protein's primary structure was determined from the non-stationary model of the second order. With the aid of the statistical criteria it was found that neighbour residuals in polypeptide chains can't be considered as independent. The new model of the DNA structural domain may be used in computer algorithms for recognition and classification of DNA functional regions.
我们引入了非平稳马尔可夫链来对大肠杆菌DNA结构域进行统计描述。这些链的所有所需参数值是通过对大量大肠杆菌编码区进行初步统计处理来确定的。结果表明,非平稳模型在预测DNA编码片段内各种核苷酸组合的出现频率方面比平稳模型更好。特别是,非平稳模型对编码区核苷酸的短距离和长距离排列给出了很好的近似。大肠杆菌蛋白质一级结构中相邻密码子和相邻氨基酸残基的相关参数是根据二阶非平稳模型确定的。借助统计标准发现,多肽链中的相邻残基不能被视为独立的。DNA结构域的新模型可用于计算机算法中对DNA功能区域的识别和分类。