Hannenhalli S, Chappey C, Koonin E V, Pevzner P A
Department of Computer Science and Engineering, Pennsylvania State University, University Park 16802, USA.
Genomics. 1995 Nov 20;30(2):299-311. doi: 10.1006/geno.1995.9873.
As large portions of related genomes are being sequenced, methods for comparing complete or nearly complete genomes, as opposed to comparing individual genes, are becoming progressively more important. A major, widespread phenomenon in genome evolution is the rearrangement of genes and gene blocks. There is, however, no consistent method for genome sequence comparison combined with the reconstruction of the evolutionary history of highly rearranged genomes. We developed a schema for genome sequence comparison that includes three successive steps: (i) comparison of all proteins encoded in different genomes and generation of genomic similarity plots; (ii) construction of an alphabet of conserved genes and gene blocks; and (iii) generation of most parsimonious genome rearrangement scenarios. The approach is illustrated by a comparison of the herpesvirus genomes that constitute the largest set of relatively long, complete genome sequences available to date. Herpesviruses have from 70 to about 200 genes; comparison of the amino acid sequences encoded in these genes results in an alphabet of about 30 conserved genes comprising 7 conserved blocks that are rearranged in the genomes of different herpesviruses. Algorithms to analyze rearrangements of multiple genomes were developed and applied to the derivation of most parsimonious scenarios of herpesvirus evolution under different evolutionary models. The developed approaches to genome comparison will be applicable to the comparative analysis of bacterial and eukaryotic genomes as soon as their sequences become available.
随着大量相关基因组被测序,与比较单个基因不同,比较完整或近乎完整基因组的方法正变得越来越重要。基因组进化中的一个主要且普遍的现象是基因和基因块的重排。然而,目前还没有一种一致的方法能将基因组序列比较与高度重排基因组的进化历史重建相结合。我们开发了一种基因组序列比较模式,它包括三个连续步骤:(i)比较不同基因组中编码的所有蛋白质并生成基因组相似性图谱;(ii)构建保守基因和基因块的字母表;(iii)生成最简约的基因组重排方案。通过比较疱疹病毒基因组对该方法进行了说明,疱疹病毒基因组是迄今为止可获得的最大一组相对较长的完整基因组序列。疱疹病毒有70到大约200个基因;对这些基因中编码的氨基酸序列进行比较,得到一个由约30个保守基因组成的字母表,这些基因包含7个保守块,在不同疱疹病毒的基因组中发生了重排。我们开发了分析多个基因组重排的算法,并将其应用于推导不同进化模型下疱疹病毒进化的最简约方案。一旦细菌和真核生物基因组序列可用,所开发的基因组比较方法将适用于它们的比较分析。