Karlin S, Ladunga I, Blaisdell B E
Department of Mathematics, Stanford University, CA 94305-2125.
Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12837-41. doi: 10.1073/pnas.91.26.12837.
Genomic homogeneity is investigated for a broad base of DNA sequences in terms of dinucleotide relative abundance distances (abbreviated delta-distances) and of oligonucleotide compositional extremes. It is shown that delta-distances between different genomic sequences in the same species are low, only about 2 or 3 times the distance found in random DNA, and are generally smaller than the between-species delta-distances. Extremes in short oligonucleotides include underrepresentation of TpA and overrepresentation of GpC in most temperate bacteriophage sequences; underrepresentation of CTAG in most eubacterial genomes; underrepresentation of GATC in most bacteriophage; CpG suppression in vertebrates, in all animal mitochondrial genomes, and in many thermophilic bacterial sequences; and overrepresentation of GpG/CpC in all animal mitochondrial sets and chloroplast genomes. Interpretations center on DNA structures (dinucleotide stacking energies, DNA curvature and superhelicity, nucleosome organization), context-dependent mutational events, methylation effects, and processes of replication and repair.
从二核苷酸相对丰度距离(简称为δ距离)和寡核苷酸组成极值的角度,对广泛的DNA序列库的基因组同质性进行了研究。结果表明,同一物种不同基因组序列之间的δ距离很低,仅约为随机DNA中发现的距离的2到3倍,并且通常小于物种间的δ距离。短寡核苷酸的极值包括:在大多数温带噬菌体序列中,TpA的代表性不足和GpC的代表性过高;在大多数真细菌基因组中,CTAG的代表性不足;在大多数噬菌体中,GATC的代表性不足;脊椎动物、所有动物线粒体基因组以及许多嗜热细菌序列中的CpG抑制;以及所有动物线粒体组和叶绿体基因组中GpG/CpC的代表性过高。解释集中在DNA结构(二核苷酸堆积能、DNA曲率和超螺旋性、核小体组织)、上下文依赖的突变事件、甲基化效应以及复制和修复过程上。