Tan Qihua, Thomassen Mads, Jochumsen Kirsten M, Mogensen Ole, Christensen Kaare, Kruse Torben A
Epidemiology, Institute of Public Health, University of Southern Denmark, J. B. Winsløws Vej 9B, 5000 Odense C, Denmark.
Adv Bioinformatics. 2009;2009:480486. doi: 10.1155/2009/480486. Epub 2009 Jul 30.
Different from significant gene expression analysis which looks for genes that are differentially regulated, feature selection in the microarray-based prognostic gene expression analysis aims at finding a subset of marker genes that are not only differentially expressed but also informative for prediction. Unfortunately feature selection in literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significances. We introduce a combinatory feature selection strategy that integrates differential gene expression analysis with the Gram-Schmidt process to identify prognostic genes that are both statistically significant and highly informative for predicting tumour survival outcomes. Empirical application to leukemia and ovarian cancer survival data through-within- and cross-study validations shows that the feature space can be largely reduced while achieving improved testing performances.
与寻找差异调控基因的显著基因表达分析不同,基于微阵列的预后基因表达分析中的特征选择旨在找到一组标记基因,这些基因不仅差异表达,而且对预测具有信息价值。不幸的是,微阵列研究文献中的特征选择主要由简单的启发式单变量基因过滤范式主导,该范式根据基因的统计显著性选择差异表达基因。我们引入了一种组合特征选择策略,将差异基因表达分析与Gram-Schmidt过程相结合,以识别对预测肿瘤生存结果具有统计学显著性和高信息价值的预后基因。通过内部和跨研究验证对白血病和卵巢癌生存数据的实证应用表明,在提高测试性能的同时,可以大幅减少特征空间。