Du Wei, Sun Ying, Wang Yan, Cao Zhongbo, Zhang Chen, Liang Yanchun
College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China.
Int J Data Min Bioinform. 2013;7(1):58-77. doi: 10.1504/ijdmb.2013.050977.
With the development of genome research, finding method to classify cancer and detect biomarkers efficiently has become a challenging problem. In this paper, a novel multi-stage method for feature selection is proposed which considers all kinds of genes in the original gene set. The method eliminates the irrelevant, noisy and redundant genes and selects a subset of relevant genes at different stages. The proposed method is examined on microarray datasets of Leukemia, Prostate, Colon, Breast, Nervous and DLBCL by different classifiers and the best accuracies of the method in these datasets are 100%, 98.04%, 100%, 89.74%, 100% and 98.28%, respectively.
随着基因组研究的发展,找到有效分类癌症和检测生物标志物的方法已成为一个具有挑战性的问题。本文提出了一种新颖的多阶段特征选择方法,该方法考虑了原始基因集中的各类基因。该方法消除了不相关、有噪声和冗余的基因,并在不同阶段选择相关基因的子集。通过不同的分类器在白血病、前列腺癌、结肠癌、乳腺癌、神经癌和弥漫性大B细胞淋巴瘤的微阵列数据集上对所提出的方法进行了检验,该方法在这些数据集中的最佳准确率分别为100%、98.04%、100%、89.74%、100%和98.28%。