Karlin S, Ladunga I
Department of Mathematics, Stanford University, CA 94305-2125.
Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12832-6. doi: 10.1073/pnas.91.26.12832.
A method for assessing genomic similarity based on relative abundances of short oligonucleotides in large DNA samples is introduced. The method requires neither homologous sequences nor prior sequence alignments. The analysis centers on (i) dinucleotide (and tri- and tetra-) relative abundance extremes in genomic sequences, (ii) distances between sequences based on all dinucleotide relative abundance values, and (iii) a multidimensional partial ordering protocol. The emphasis in this paper is on assessments of general relatedness of genomes as distinguished from phylogenetic reconstructions. Our methods demonstrate that the relative abundance distances almost always differ more for genomic interspecific sequence comparisons than for genomic intraspecific sequence comparisons, indicating congruence over different genome sequence samples. The genomic comparisons are generally concordant with accepted phylogenies among vertebrate and among fungal species sequences. Several unexpected relationships between the major groups of metazoa, fungal, and protist DNA emerge, including the following. (i) Schizosaccharomyces pombe and Saccharomyces cerevisiae in dinucleotide relative abundance distances are as similar to each other as human is to bovine. (ii) S. cerevisiae, although substantially far from, is significantly closer to the vertebrates than are the invertebrates (Drosophila melanogaster, Bombyx mori, and Caenorhabditis elegans). This phenomenon may suggest variable evolutionary rates during the metazoan radiations and slower changes in the fungal divergences, and/or a polyphyletic origin of metazoa. (iii) The genomic sequences of D. melanogaster and Trypanosoma brucei are strikingly similar. This DNA similarity might be explained by some molecular adaptation of the parasite to its dipteran (tsetse fly) host, a host-parasite gene transfer hypothesis. Robustness of the methods may be due to a genomic signature of dinucleotide relative abundance values reflecting DNA structures related to dinucleotide stacking energies, constraints of DNA curvature, and mechanisms attendant to replication, repair, and recombination.
本文介绍了一种基于大DNA样本中短寡核苷酸相对丰度来评估基因组相似性的方法。该方法既不需要同源序列,也不需要预先进行序列比对。分析主要集中在以下几个方面:(i)基因组序列中二核苷酸(以及三核苷酸和四核苷酸)相对丰度的极值;(ii)基于所有二核苷酸相对丰度值的序列间距离;(iii)一种多维偏序协议。本文重点在于评估基因组的一般相关性,这与系统发育重建有所不同。我们的方法表明,基因组种间序列比较的相对丰度距离几乎总是比基因组种内序列比较的差异更大,这表明在不同的基因组序列样本中具有一致性。基因组比较通常与脊椎动物以及真菌物种序列中公认的系统发育关系一致。后生动物、真菌和原生生物DNA的主要类群之间出现了一些意想不到的关系,包括以下几点:(i)裂殖酵母和酿酒酵母在二核苷酸相对丰度距离上彼此相似程度与人类和牛的相似程度相当。(ii)酿酒酵母虽然与脊椎动物相差较大,但与无脊椎动物(黑腹果蝇、家蚕和秀丽隐杆线虫)相比,与脊椎动物的亲缘关系明显更近。这种现象可能表明后生动物辐射期间进化速率存在差异,真菌分化过程中的变化较慢,和/或后生动物具有多源起源。(iii)黑腹果蝇和布氏锥虫的基因组序列惊人地相似。这种DNA相似性可能是由于寄生虫对其双翅目(采采蝇)宿主的某种分子适应,即宿主 - 寄生虫基因转移假说。这些方法的稳健性可能归因于二核苷酸相对丰度值的基因组特征,这些特征反映了与二核苷酸堆积能量相关的DNA结构、DNA曲率的限制以及复制、修复和重组相关的机制。