Stoye Jens, Wittler Roland
Genome Informatics Group, Faculty of Technology, Bielefeld University, 33594 Bielefeld, Germany.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jul-Sep;6(3):387-400. doi: 10.1109/TCBB.2008.135.
The order of genes in genomes provides extensive information. In comparative genomics, differences or similarities of gene orders are determined to predict functional relations of genes or phylogenetic relations of genomes. For this purpose, various combinatorial models can be used to identify gene clusters--groups of genes that are colocated in a set of genomes. We introduce a unified approach to model gene clusters and define the problem of labeling the inner nodes of a given phylogenetic tree with sets of gene clusters. Our optimization criterion in this context combines two properties: parsimony, i.e., the number of gains and losses of gene clusters has to be minimal, and consistency, i.e., for each ancestral node, there must exist at least one potential gene order that contains all the reconstructed clusters. We present and evaluate an exact algorithm to solve this problem. Despite its exponential worst-case time complexity, our method is suitable even for large-scale data. We show the effectiveness and efficiency on both simulated and real data.
基因组中基因的顺序提供了丰富的信息。在比较基因组学中,确定基因顺序的差异或相似性以预测基因的功能关系或基因组的系统发育关系。为此,可以使用各种组合模型来识别基因簇——即在一组基因组中位于同一位置的基因组。我们引入一种统一的方法来对基因簇进行建模,并定义了用基因簇集标记给定系统发育树内部节点的问题。在此背景下,我们的优化标准结合了两个属性:简约性,即基因簇的获得和丢失数量必须最小;一致性,即对于每个祖先节点,必须存在至少一种包含所有重建簇的潜在基因顺序。我们提出并评估了一种解决此问题的精确算法。尽管其最坏情况时间复杂度为指数级,但我们的方法甚至适用于大规模数据。我们在模拟数据和真实数据上都展示了其有效性和效率。