Su Qiang, Wang Yina, Jiang Xiaobing, Chen Fuxue, Lu Wen-Cong
School of Communication & Information Engineering, Shanghai University, Shanghai 2000444, China.
Department of VIP Medical Center, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou 510630, China.
Biomed Res Int. 2017;2017:1645619. doi: 10.1155/2017/1645619. Epub 2017 May 8.
To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test.
We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms.
The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.
为了解决从癌症基因表达数据集中选择显著基因这一具有挑战性的问题,本文提出了一种基于柯尔莫哥洛夫-斯米尔诺夫(K-S)检验和基于相关性的特征选择(CFS)原则的基因子集选择算法。该算法首先使用K-S检验选择显著基因,然后使用CFS从K-S检验选择出的基因中进一步选择基因。
我们采用支持向量机(SVM)作为分类工具,并使用准确率标准来评估分类器在所选基因子集上的性能。该方法将所提出的基因子集选择算法与K-S检验、CFS、最小冗余最大相关性(mRMR)和ReliefF算法进行了比较。上述基因选择算法对5个基因表达数据集的平均实验结果表明,基于准确率,新的基于K-S和CFS的算法性能优于K-S检验、CFS、mRMR和ReliefF算法。
实验结果表明,与K-S检验、CFS、mRMR和ReliefF算法相比,K-S检验-CFS基因选择算法是一种非常有效且有前景的方法。