Chen Zheng, Tang Jian
Department of Computer Science, Memorial University of Newfoundland, St. John's, NL, A1B 3X5, Canada.
Int J Data Min Bioinform. 2010;4(5):520-34. doi: 10.1504/ijdmb.2010.035898.
Reducing redundancy is an important goal for most feature selection methods. Almost all methods for redundancy reduction are based on the correlation between gene expression levels. In this paper, we utilise the knowledge in Gene Ontology to provide a new model for measuring redundancy among genes. We propose a novel similarity measure, which incorporates semantic and expression level similarities. We compare our method with traditional expression value-only similarity model on several public microarray datasets. The experimental results show that our approach is capable of offering higher or the same classification accuracy while providing a smaller gene feature.
减少冗余是大多数特征选择方法的一个重要目标。几乎所有减少冗余的方法都是基于基因表达水平之间的相关性。在本文中,我们利用基因本体论中的知识来提供一种测量基因间冗余的新模型。我们提出了一种新颖的相似性度量方法,该方法结合了语义和表达水平的相似性。我们在几个公共微阵列数据集上,将我们的方法与传统的仅基于表达值的相似性模型进行了比较。实验结果表明,我们的方法能够在提供更小的基因特征的同时,提供更高或相同的分类准确率。