Mitchell David, Bridge Robert
Vice Deanery of Genetics and Microbiology, Trinity College, Dublin, Ireland.
Biochem Biophys Res Commun. 2006 Jun 2;344(2):612-6. doi: 10.1016/j.bbrc.2006.03.182. Epub 2006 Apr 6.
While veritable oceans of ink have been spilled over the base distributions within genes, the literature is virtually silent on large scale intra genomic base distribution. To address this issue, we have examined approximately 3400 chromosomal sequences from approximately 2000 entire genomes-including DNA and RNA, single- and double-stranded, coding and non-coding genomes. For each sequence the mean, variance, skewness, and kurtosis for each base were computed along with the genome base composition. The main findings are: (1) there is no simple relationship between these statistics and the base composition of the genome, (2) in non-viral genomes, base distribution is non-uniform, (3) base distribution in non-eukaryotic genomes obeys a number of simple rules, (4) these rules are not dependent on the presence of coding sequences, (5) bacterial genomes in particular are unusually compliant with these rules, and (6) eukaryotes have a unique pattern of base distribution.
尽管已有大量笔墨用于阐述基因内的碱基分布,但关于基因组内大规模碱基分布的文献却几乎没有。为解决这一问题,我们研究了来自约2000个完整基因组的约3400条染色体序列,包括DNA和RNA、单链和双链、编码和非编码基因组。对于每条序列,计算了每个碱基的均值、方差、偏度和峰度以及基因组碱基组成。主要发现如下:(1)这些统计量与基因组的碱基组成之间不存在简单关系;(2)在非病毒基因组中,碱基分布不均匀;(3)非真核生物基因组中的碱基分布遵循一些简单规则;(4)这些规则不依赖于编码序列的存在;(5)特别是细菌基因组非常符合这些规则;(6)真核生物具有独特的碱基分布模式。