Suppr超能文献

将生物学知识融入基于距离的微阵列基因表达数据聚类分析中。

Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data.

作者信息

Huang Desheng, Pan Wei

机构信息

Department of Mathematics, China Medical University Shenyang, China.

出版信息

Bioinformatics. 2006 May 15;22(10):1259-68. doi: 10.1093/bioinformatics/btl065. Epub 2006 Feb 24.

Abstract

MOTIVATION

Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering.

RESULTS

To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method.

摘要

动机

由于共表达基因可能具有相同的生物学功能,因此基因表达谱的聚类分析已被应用于基因功能发现。大多数现有的聚类方法在聚类过程中忽略了已知的基因功能。

结果

为了利用不断积累的基因功能注释,我们建议将已知基因功能纳入一种新的距离度量中,当且仅当两个基因共享共同基因功能时,该度量会将基于基因表达的距离缩小至0。我们采用了两步法。首先,在任何基于距离的聚类方法(例如K-medoids或层次聚类)中使用收缩距离度量,对具有已知功能的基因进行聚类。其次,在保留第一步中具有已知功能基因的聚类结果的同时,使用基于表达的距离度量对其余未知功能的基因进行聚类,将它们中的每一个分配到第一步中获得的聚类之一或一些新的聚类中。一项模拟研究以及对酵母基因功能预测的应用证明了我们的提议相对于标准方法的优势。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验