Wang Aiguo, An Ning, Chen Guilin, Li Lian, Alterovitz Gil
School of Computer and Information, Hefei University of Technology, Hefei, China.
School of Computer and Information Engineering, Chuzhou University, Chuzhou, China.
Comput Biol Med. 2015 Jul;62:14-24. doi: 10.1016/j.compbiomed.2015.04.011. Epub 2015 Apr 17.
Gene selection plays a crucial role in constructing efficient classifiers for microarray data classification, since microarray data is characterized by high dimensionality and small sample sizes and contains irrelevant and redundant genes. In practical use, partial least squares-based gene selection approaches can obtain gene subsets of good qualities, but are considerably time-consuming. In this paper, we propose to integrate partial least squares based recursive feature elimination (PLS-RFE) with two feature elimination schemes: simulated annealing and square root, respectively, to speed up the feature selection process. Inspired from the strategy of annealing schedule, the two proposed approaches eliminate a number of features rather than one least informative feature during each iteration and the number of removed features decreases as the iteration proceeds. To verify the effectiveness and efficiency of the proposed approaches, we perform extensive experiments on six publicly available microarray data with three typical classifiers, including Naïve Bayes, K-Nearest-Neighbor and Support Vector Machine, and compare our approaches with ReliefF, PLS and PLS-RFE feature selectors in terms of classification accuracy and running time. Experimental results demonstrate that the two proposed approaches accelerate the feature selection process impressively without degrading the classification accuracy and obtain more compact feature subsets for both two-category and multi-category problems. Further experimental comparisons in feature subset consistency show that the proposed approach with simulated annealing scheme not only has better time performance, but also obtains slightly better feature subset consistency than the one with square root scheme.
基因选择在构建用于微阵列数据分类的高效分类器中起着至关重要的作用,因为微阵列数据具有高维度和小样本量的特点,并且包含不相关和冗余的基因。在实际应用中,基于偏最小二乘法的基因选择方法可以获得质量较好的基因子集,但耗时较长。在本文中,我们提出将基于偏最小二乘法的递归特征消除(PLS-RFE)分别与两种特征消除方案:模拟退火和平方根相结合,以加速特征选择过程。受退火调度策略的启发,所提出的两种方法在每次迭代中消除多个特征而不是一个信息最少的特征,并且随着迭代的进行,消除的特征数量会减少。为了验证所提出方法的有效性和效率,我们使用三种典型的分类器,包括朴素贝叶斯、K近邻和支持向量机,对六个公开可用的微阵列数据进行了广泛的实验,并在分类准确率和运行时间方面将我们的方法与ReliefF、PLS和PLS-RFE特征选择器进行了比较。实验结果表明,所提出的两种方法在不降低分类准确率的情况下显著加速了特征选择过程,并且对于两类和多类问题都获得了更紧凑的特征子集。在特征子集一致性方面的进一步实验比较表明,所提出的具有模拟退火方案的方法不仅具有更好的时间性能,而且在特征子集一致性方面比具有平方根方案的方法略好。