Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute, Cambridge, MA 02141, USA.
Cell. 2014 Jul 3;158(1):213-25. doi: 10.1016/j.cell.2014.05.034.
The availability of diverse genomes makes it possible to predict gene function based on shared evolutionary history. This approach can be challenging, however, for pathways whose components do not exhibit a shared history but rather consist of distinct "evolutionary modules." We introduce a computational algorithm, clustering by inferred models of evolution (CLIME), which inputs a eukaryotic species tree, homology matrix, and pathway (gene set) of interest. CLIME partitions the gene set into disjoint evolutionary modules, simultaneously learning the number of modules and a tree-based evolutionary history that defines each module. CLIME then expands each module by scanning the genome for new components that likely arose under the inferred evolutionary model. Application of CLIME to ∼1,000 annotated human pathways and to the proteomes of yeast, red algae, and malaria reveals unanticipated evolutionary modularity and coevolving components. CLIME is freely available and should become increasingly powerful with the growing wealth of eukaryotic genomes.
多样化的基因组的可用性使得基于共同的进化历史来预测基因功能成为可能。然而,对于那些组成部分没有共同历史,而是由不同的“进化模块”组成的途径来说,这种方法可能具有挑战性。我们引入了一种计算算法,即基于进化推断模型的聚类(CLIME),它输入真核生物种系树、同源矩阵和感兴趣的途径(基因集)。CLIME 将基因集划分为不相交的进化模块,同时学习模块的数量和定义每个模块的基于树的进化历史。然后,CLIME 通过扫描基因组以找到可能在推断的进化模型下产生的新组件来扩展每个模块。CLIME 应用于约 1000 个注释的人类途径和酵母、红藻和疟疾的蛋白质组,揭示了出人意料的进化模块性和共同进化的成分。CLIME 是免费提供的,并且随着真核生物基因组的不断丰富,它的功能将变得越来越强大。