Huang Desheng, Pan Wei
Department of Mathematics, China Medical University Shenyang, China.
Bioinformatics. 2006 May 15;22(10):1259-68. doi: 10.1093/bioinformatics/btl065. Epub 2006 Feb 24.
Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering.
To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method.
由于共表达基因可能具有相同的生物学功能,因此基因表达谱的聚类分析已被应用于基因功能发现。大多数现有的聚类方法在聚类过程中忽略了已知的基因功能。
为了利用不断积累的基因功能注释,我们建议将已知基因功能纳入一种新的距离度量中,当且仅当两个基因共享共同基因功能时,该度量会将基于基因表达的距离缩小至0。我们采用了两步法。首先,在任何基于距离的聚类方法(例如K-medoids或层次聚类)中使用收缩距离度量,对具有已知功能的基因进行聚类。其次,在保留第一步中具有已知功能基因的聚类结果的同时,使用基于表达的距离度量对其余未知功能的基因进行聚类,将它们中的每一个分配到第一步中获得的聚类之一或一些新的聚类中。一项模拟研究以及对酵母基因功能预测的应用证明了我们的提议相对于标准方法的优势。