Department of Public Health Sciences, University of Toronto, Health Sciences Building, Toronto, Ontario, Canada.
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):50-63. doi: 10.1109/TCBB.2007.70267.
While clustering genes remains one of the most popular exploratory tools for expression data, it often results in a highly variable and biologically uninformative clusters. This paper explores a data fusion approach to clustering microarray data. Our method, which combined expression data and Gene Ontology (GO)-derived information, is applied on a real data set to perform genome-wide clustering. A set of novel tools is proposed to validate the clustering results and pick a fair value of infusion coefficient. These tools measure stability, biological relevance, and distance from the expression-only clustering solution. Our results indicate that a data-fusion clustering leads to more stable, biologically relevant clusters that are still representative of the experimental data.
虽然聚类基因仍然是表达数据最常用的探索工具之一,但它通常会产生高度可变且生物学上无信息的聚类。本文探讨了一种用于聚类微阵列数据的数据融合方法。我们的方法结合了表达数据和基因本体论(GO)衍生的信息,应用于真实数据集进行全基因组聚类。提出了一组新的工具来验证聚类结果并选择合理的融合系数值。这些工具用于衡量稳定性、生物学相关性以及与仅表达聚类解决方案的距离。我们的结果表明,数据融合聚类可以产生更稳定、更具生物学相关性的聚类,同时仍然代表实验数据。