Kim Sun, Choi Jeong-Hyeon, Yang Jiong
School of Informatics, Center for Genomics and Bioinformatics, Indiana University, IN 47408, USA.
Proc IEEE Comput Syst Bioinform Conf. 2005:44-55. doi: 10.1109/csb.2005.33.
Functionally related genes co-evolve, probably due to the strong selection pressure in evolution. Thus we expect that they are present in multiple genomes. Physical proximity among genes, known as gene team, is a very useful concept to discover functionally related genes in multiple genomes. However, there are also many gene sets that do not preserve physical proximity. In this paper, we generalized the gene team model, that looks for gene clusters in a physically clustered form, to multiple genome cases with relaxed constraint. We propose a novel hybrid pattern model that combines the set and the sequential pattern models. Our model searches for gene clusters with and/or without physical proximity constraint. This model is implemented and tested with 97 genomes (120 replicons). The result was analyzed to show the usefulness of our model. Especially, analysis of gene clusters that belong to B. subtilis and E. coli demonstrated that our model predicted many experimentally verified operons and functionally related clusters. Our program is fast enough to provide a sevice on the web at http://platcom. informatics.indiana.edu/platcom/. Users can select any combination of 97 genomes to predict gene teams.
功能相关的基因共同进化,这可能是由于进化过程中的强大选择压力所致。因此,我们预计它们存在于多个基因组中。基因之间的物理邻近性,即所谓的基因团队,是在多个基因组中发现功能相关基因的一个非常有用的概念。然而,也有许多基因集并不保持物理邻近性。在本文中,我们将寻找物理聚类形式的基因簇的基因团队模型推广到具有宽松约束的多个基因组情况。我们提出了一种新颖的混合模式模型,该模型结合了集合模式模型和序列模式模型。我们的模型搜索有和/或没有物理邻近性约束的基因簇。该模型已使用97个基因组(120个复制子)进行了实现和测试。对结果进行分析以表明我们模型的有用性。特别是,对枯草芽孢杆菌和大肠杆菌的基因簇分析表明,我们的模型预测了许多经实验验证的操纵子和功能相关的簇。我们的程序速度足够快,可以在http://platcom.informatics.indiana.edu/platcom/网站上提供服务。用户可以选择97个基因组的任何组合来预测基因团队。