Bastos Carlos A C, Afreixo Vera, Pinho Armando J, Garcia Sara P, Rodrigues João M O S, Ferreira Paulo J S G
Signal Processing Lab, IEETA, University of Aveiro, 3810-193 Aveiro, Portugal.
J Integr Bioinform. 2011 Sep 15;8(3):172. doi: 10.2390/biecoll-jib-2011-172.
We study the inter-dinucleotide distance distributions in the human genome, both in the whole-genome and protein-coding regions. The inter-dinucleotide distance is defined as the distance to the next occurrence of the same dinucleotide. We consider the 16 sequences of inter-dinucleotide distances and two reading frames. Our results show a period-3 oscillation in the protein-coding inter-dinucleotide distance distributions that is absent from the whole-genome distributions. We also compare the distance distribution of each dinucleotide to a reference distribution, that of a random sequence generated with the same dinucleotide abundances, revealing the CG dinucleotide as the one with the highest cumulative relative error for the first 60 distances. Moreover, the distance distribution of each dinucleotide is compared to the distance distribution of all other dinucleotides using the Kullback-Leibler divergence. We find that the distance distribution of a dinucleotide and that of its reversed complement are very similar, hence, the divergence between them is very small. This is an interesting finding that may give evidence of a stronger parity rule than Chargaff's second parity rule.
我们研究了人类基因组中全基因组和蛋白质编码区域的双核苷酸间距分布。双核苷酸间距定义为到下一次出现相同双核苷酸的距离。我们考虑了16个双核苷酸间距序列和两个阅读框。我们的结果表明,蛋白质编码双核苷酸间距分布中存在全基因组分布所没有的3周期振荡。我们还将每个双核苷酸的间距分布与参考分布进行比较,该参考分布是具有相同双核苷酸丰度的随机序列的分布,结果显示CG双核苷酸在前60个间距中具有最高的累积相对误差。此外,使用库尔贝克-莱布勒散度将每个双核苷酸的间距分布与所有其他双核苷酸的间距分布进行比较。我们发现一个双核苷酸与其反向互补序列的间距分布非常相似,因此它们之间的散度非常小。这是一个有趣的发现,可能为比查加夫第二互补规则更强的互补规则提供证据。