Binnewies Tim T, Motro Yair, Hallin Peter F, Lund Ole, Dunn David, La Tom, Hampson David J, Bellgard Matthew, Wassenaar Trudy M, Ussery David W
Center for Biological Sequence Analysis, Technical University of Denmark, 2800, Lyngby, Denmark.
Funct Integr Genomics. 2006 Jul;6(3):165-85. doi: 10.1007/s10142-006-0027-2. Epub 2006 May 12.
It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this information.
自首个细菌基因组序列发表以来,已经过去了10多年。如今已有数百个细菌基因组序列可用于比较基因组学研究,并且很快就有可能针对一千多个基因组搜索特定蛋白质。本综述的主题将探讨一个相对简单的问题:“我们从这大量的新基因组数据中学到了什么?”也许最重要的经验教训之一是,即使在同一物种的基因组之间,大规模变异水平上的遗传多样性也远比人们想象的要大。依赖于在整个基因组中分散的单个碱基水平上相对缓慢积累的突变事件的经典教科书式进化观点已经改变。通过检查数百个细菌基因组的序列得出的最明显结论之一是存在大量的多样性——即使在同一细菌物种的不同基因组中也是如此。这种多样性是由多种机制产生的,包括移动遗传元件和噬菌体。对目前已测序的20个大肠杆菌基因组的检查显著地说明了这一点,其基因组大小从4.6兆碱基对到5.5兆碱基对不等;许多变异似乎起源于噬菌体。本综述还讨论了移动遗传元件,包括致病岛和转座元件的结构。至少有20种不同的方法可用于比较细菌基因组。宏基因组学为研究生态系统中发现的基因组序列提供了机会,包括难以培养的物种的基因组。很明显,基因组序列不仅仅代表一个生物体的基因序列集合,而且有关该生物体的环境和生长条件的信息对于解释基因组数据很重要。新提出的“关于基因组序列的最少信息”标准已经制定出来以获取这些信息。