Ling Xu, He Xin, Xin Dong
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
Bioinformatics. 2009 Mar 1;25(5):571-7. doi: 10.1093/bioinformatics/btp027. Epub 2009 Jan 21.
Spatial clusters of genes conserved across multiple genomes provide important clues to gene functions and evolution of genome organization. Existing methods of identifying these clusters often made restrictive assumptions, such as exact conservation of gene order, and relied on heuristic algorithms.
We developed a very efficient algorithm based on a 'gene teams' model that allows genes in the clusters to appear in different orders. This allows us to detect conserved gene clusters under flexible evolutionary constraints in a large number of genomes. Our statistical evaluation incorporates the evolutionary relationship among genomes, a key aspect that has been missing in most previous studies. We conducted a large-scale analysis of 133 bacterial genomes. Our results confirm that our approach is an effective way of uncovering functionally related genes. The comparison with known operons and the analysis of the structural properties of our predicted clusters suggest that operons are an important source of constraint, but there are also other forces that determine evolution of gene order and arrangement. Using our method, we predicted functions of many poorly characterized genes in bacterial. The combined algorithmic and statistical methods we present here provide a rigorous framework for systematically studying evolutionary constraints of genomic contexts.
The software, data and the full results of this article are available online at http://www.ews.uiuc.edu/~xuling/mcmusec.
多个基因组中保守的基因空间簇为基因功能和基因组组织的进化提供了重要线索。现有的识别这些簇的方法通常做出限制性假设,例如基因顺序的精确保守性,并且依赖于启发式算法。
我们基于“基因团队”模型开发了一种非常高效的算法,该模型允许簇中的基因以不同顺序出现。这使我们能够在大量基因组中灵活的进化约束下检测保守的基因簇。我们的统计评估纳入了基因组之间的进化关系,这是大多数先前研究中缺失的一个关键方面。我们对133个细菌基因组进行了大规模分析。我们的结果证实,我们的方法是揭示功能相关基因的有效途径。与已知操纵子的比较以及对我们预测簇的结构特性的分析表明,操纵子是约束的重要来源,但也有其他力量决定基因顺序和排列的进化。使用我们的方法,我们预测了细菌中许多特征不明确的基因的功能。我们在此提出的算法和统计方法相结合,为系统研究基因组背景的进化约束提供了一个严格的框架。
本文的软件、数据和完整结果可在http://www.ews.uiuc.edu/~xuling/mcmusec在线获取。