Pavlidis Paul, Weston Jason, Cai Jinsong, Noble William Stafford
Columbia Genome Center, Columbia University, New York, NY 10027, USA.
J Comput Biol. 2002;9(2):401-11. doi: 10.1089/10665270252935539.
In our attempts to understand cellular function at the molecular level, we must be able to synthesize information from disparate types of genomic data. We consider the problem of inferring gene functional classifications from a heterogeneous data set consisting of DNA microarray expression measurements and phylogenetic profiles from whole-genome sequence comparisons. We demonstrate the application of the support vector machine (SVM) learning algorithm to this functional inference task. Our results suggest the importance of exploiting prior information about the heterogeneity of the data. In particular, we propose an SVM kernel function that is explicitly heterogeneous. In addition, we describe feature scaling methods for further exploiting prior knowledge of heterogeneity by giving each data type different weights.
在我们试图从分子水平理解细胞功能的过程中,我们必须能够整合来自不同类型基因组数据的信息。我们考虑从一个由DNA微阵列表达测量值和全基因组序列比较的系统发育谱组成的异构数据集中推断基因功能分类的问题。我们展示了支持向量机(SVM)学习算法在这个功能推断任务中的应用。我们的结果表明利用关于数据异质性的先验信息的重要性。特别是,我们提出了一种明确异构的SVM核函数。此外,我们描述了特征缩放方法,通过给每种数据类型赋予不同的权重来进一步利用异质性的先验知识。