Karlin S, Mrázek J
Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA.
Proc Natl Acad Sci U S A. 1997 Sep 16;94(19):10227-32. doi: 10.1073/pnas.94.19.10227.
Eukaryotic genome similarity relationships are inferred using sequence information derived from large aggregates of genomic sequences. Comparisons within and between species sample sequences are based on the profile of dinucleotide relative abundance values (The profile is rhoXY = fXY/fXfY for all XY, where fX denotes the frequency of the nucleotide X and fXY denotes the frequency of the dinucleotide XY, both computed from the sequence concatenated with its inverted complement). Previous studies with respect to prokaryotes and this study document that profiles of different DNA sequence samples (sample size >/=50 kb) from the same organism are generally much more similar to each other than they are to profiles from other organisms, and that closely related organisms generally have more similar profiles than do distantly related organisms. On this basis we refer to the collection (rhoXY) as the genome signature. This paper identifies rhoXY extremes and compares genome signature differences for a diverse range of eukaryotic species. Interpretations on the mechanisms maintaining these profile differences center on genome-wide replication, repair, DNA structures, and context-dependent mutational biases. It is also observed that mitochondrial genome signature differences between species parallel the corresponding nuclear genome signature differences despite large differences between corresponding mitochondrial and nuclear signatures. The genome signature differences also have implications for contrasts between rodents and other mammals, and between monocot and dicot plants, as well as providing evidence for similarities among fungi and the diversity of protists.
利用从大量基因组序列集合中获得的序列信息来推断真核生物基因组的相似性关系。物种内和物种间样本序列的比较基于二核苷酸相对丰度值的概况(该概况为对于所有的XY,ρXY = fXY/fXfY,其中fX表示核苷酸X的频率,fXY表示二核苷酸XY的频率,二者均根据与其反向互补序列拼接后的序列计算得出)。先前关于原核生物的研究以及本研究表明,来自同一生物体的不同DNA序列样本(样本大小≥50 kb)的概况通常彼此之间比与其他生物体的概况更为相似,而且亲缘关系密切的生物体通常比亲缘关系较远的生物体具有更相似的概况。在此基础上,我们将集合(ρXY)称为基因组特征。本文确定了ρXY的极值,并比较了多种真核生物物种的基因组特征差异。对维持这些概况差异的机制的解释集中在全基因组复制、修复、DNA结构以及上下文依赖的突变偏差上。还观察到,尽管相应的线粒体和核特征存在很大差异,但物种间线粒体基因组特征差异与相应的核基因组特征差异是平行的。基因组特征差异对于啮齿动物与其他哺乳动物之间、单子叶植物与双子叶植物之间的对比也具有启示意义,同时也为真菌之间的相似性以及原生生物的多样性提供了证据。