Liu Jinze, Wang Wei, Yang Jiong
Department of Computer Science, University of North Carolina, Chapel Hill, 27599, USA.
Proc IEEE Comput Syst Bioinform Conf. 2004:436-47.
The soundness of clustering in the analysis of gene expression profiles and gene function prediction is based on the hypothesis that genes with similar expression profiles may imply strong correlations with their functions in the biological activities. Gene Ontology (GO) has become a well accepted standard in organizing gene function categories. Different gene function categories in GO can have very sophisticated relationships, such as 'part of' and 'overlapping'. Until now, no clustering algorithm can generate gene clusters within which the relationships can naturally reflect those of gene function categories in the GO hierarchy. The failure in resembling the relationships may reduce the confidence of clustering in gene function prediction. In this paper, we present a new clustering technique, Smart Hierarchical Tendency Preserving clustering (SHTP-clustering), based on a bicluster model, Tendency Preserving cluster (TP-Cluster). By directly incorporating Gene Ontology information into the clustering process, the SHTP-clustering algorithm yields a TP-cluster tree within which any subtree can be well mapped to a part of the GO hierarchy. Our experiments on yeast cell cycle data demonstrate that this method is efficient and effective in generating the biological relevant TP-Clusters.
在基因表达谱分析和基因功能预测中,聚类的合理性基于这样一个假设:具有相似表达谱的基因可能在生物活动中与其功能存在强相关性。基因本体论(GO)已成为组织基因功能类别的公认标准。GO中的不同基因功能类别可能具有非常复杂的关系,例如“部分属于”和“重叠”。到目前为止,没有聚类算法能够生成基因簇,其中基因簇内的关系能够自然地反映GO层次结构中基因功能类别的关系。无法反映这些关系可能会降低基因功能预测中聚类的可信度。在本文中,我们提出了一种基于双聚类模型——趋势保留聚类(TP-Cluster)的新聚类技术,即智能层次趋势保留聚类(SHTP-聚类)。通过将基因本体论信息直接纳入聚类过程,SHTP-聚类算法生成了一个TP-聚类树,其中任何子树都可以很好地映射到GO层次结构的一部分。我们对酵母细胞周期数据的实验表明,该方法在生成与生物学相关的TP-聚类方面是高效且有效的。