Dotan-Cohen Dikla, Melkman Avraham A, Kasif Simon
Department of Computer Science, Ben Gurion University, Beer Sheva 84105, Israel.
Bioinformatics. 2007 Dec 15;23(24):3335-42. doi: 10.1093/bioinformatics/btm526. Epub 2007 Nov 7.
Hierarchical clustering is widely used to cluster genes into groups based on their expression similarity. This method first constructs a tree. Next this tree is partitioned into subtrees by cutting all edges at some level, thereby inducing a clustering. Unfortunately, the resulting clusters often do not exhibit significant functional coherence.
To improve the biological significance of the clustering, we develop a new framework of partitioning by snipping--cutting selected edges at variable levels. The snipped edges are selected to induce clusters that are maximally consistent with partially available background knowledge such as functional classifications. Algorithms for two key applications are presented: functional prediction of genes, and discovery of functionally enriched clusters of co-expressed genes. Simulation results and cross-validation tests indicate that the algorithms perform well even when the actual number of clusters differs considerably from the requested number. Performance is improved compared with a previously proposed algorithm.
A java package is available at http://www.cs.bgu.ac.il/~dotna/ TreeSnipping
层次聚类法被广泛用于根据基因表达相似性将基因聚类成组。该方法首先构建一棵树。接下来,通过在某个层次切断所有边将这棵树划分为子树,从而产生一个聚类。不幸的是,所得的聚类往往不具有显著的功能一致性。
为了提高聚类的生物学意义,我们开发了一种新的剪枝划分框架——在可变层次切断选定的边。选择被剪枝的边以诱导出与部分可用背景知识(如功能分类)最大程度一致的聚类。给出了两个关键应用的算法:基因的功能预测以及共表达基因功能富集聚类的发现。模拟结果和交叉验证测试表明,即使实际聚类数与要求的聚类数有很大差异,这些算法仍能表现良好。与先前提出的算法相比,性能有所提高。