Boratyn Grzegorz M, Datta Susmita, Datta Somnath
Clinical Proteomics Center, University of Louisville, Louisville, KY 40202, USA.
Bioinformation. 2007 Apr 10;1(10):396-405. doi: 10.6026/97320630001396.
In this paper we propose a data based algorithm to marry existing biological knowledge (e.g., functional annotations of genes) with experimental data (gene expression profiles) in creating an overall dissimilarity that can be used with any clustering algorithm that uses a general dissimilarity matrix. We explore this idea with two publicly available gene expression data sets and functional annotations where the results are compared with the clustering results that uses only the experimental data. Although more elaborate evaluations might be called for, the present paper makes a strong case for utilizing existing biological information in the clustering process.
Supplement is available at www.somnathdatta.org/Supp/Bioinformation/appendix.pdf.
在本文中,我们提出一种基于数据的算法,将现有的生物学知识(例如基因的功能注释)与实验数据(基因表达谱)相结合,以创建一个总体差异度,该差异度可用于任何使用通用差异矩阵的聚类算法。我们使用两个公开可用的基因表达数据集和功能注释来探索这一想法,并将结果与仅使用实验数据的聚类结果进行比较。尽管可能需要更详尽的评估,但本文有力地证明了在聚类过程中利用现有生物学信息的合理性。
补充材料可在www.somnathdatta.org/Supp/Bioinformation/appendix.pdf获取。