Oliver J L, Bernaola-Galván P, Guerrero-García J, Román-Roldán R
Department of Genetics, Faculty of Sciences, University of Granada, Spain.
J Theor Biol. 1993 Feb 21;160(4):457-70. doi: 10.1006/jtbi.1993.1030.
A new method to determine entropic profiles in DNA sequences is presented. It is based on the chaos-game representation (CGR) of gene structure, a technique which produces a fractal-like picture of DNA sequences. First, the CGR image was divided into squares 4-m in size (m being the desired resolution), and the point density counted. Second, appropriate intervals were adjusted, and then a histogram of densities was prepared. Third, Shannon's formula was applied to the probability-distribution histogram, thus obtaining a new entropic estimate for DNA sequences, the histogram entropy, a measurement that goes with the level of constraints on the DNA sequence. Lastly, the entropic profile for the sequence was drawn, by considering the entropies at each resolution level, thus providing a way to summarize the complexity of large genomic regions or even entire genomes at different resolution levels. The application of the method to DNA sequences reveals that entropic profiles obtained in this way, as opposed to previously published ones, clearly discriminate between random and natural DNA sequences. Entropic profiles also show a different degree of variability within and between genomes. The results of these analyses are discussed in relation both to the genome compartmentalization in vertebrates and to the differential action of compositional and/or functional constraints on DNA sequences.
提出了一种确定DNA序列熵分布的新方法。它基于基因结构的混沌游戏表示(CGR),这是一种能生成DNA序列类分形图的技术。首先,将CGR图像划分为大小为4^m的正方形(m为所需分辨率),并计算点密度。其次,调整合适的区间,然后制备密度直方图。第三,将香农公式应用于概率分布直方图,从而获得DNA序列的新熵估计值,即直方图熵,这是一种与DNA序列的约束水平相关的度量。最后,通过考虑每个分辨率水平的熵来绘制序列的熵分布,从而提供一种在不同分辨率水平上总结大型基因组区域甚至整个基因组复杂性的方法。将该方法应用于DNA序列表明,以这种方式获得的熵分布与先前发表的不同,能清楚地区分随机DNA序列和天然DNA序列。熵分布还显示出基因组内部和之间不同程度的变异性。这些分析结果将结合脊椎动物的基因组划分以及DNA序列上组成和/或功能约束的差异作用进行讨论。