Ahmed Saeed, Kabir Muhammad, Ali Zakir, Arif Muhammad, Ali Farman, Yu Dong-Jun
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
Comb Chem High Throughput Screen. 2018;21(9):631-645. doi: 10.2174/1386207322666181220124756.
Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance.
In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test.
The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods.
Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.
癌症是一种在全球范围内具有危险性的疾病,由基因组中的体细胞突变引起。在早期阶段诊断这种致命疾病是微阵列数据一项全新的临床应用。在DNA微阵列技术中,基因表达数据具有高维度和小样本量的特点。因此,开发高效且强大的特征选择方法不可或缺,这些方法能够识别一小部分基因以实现更好的分类性能。
在本研究中,我们开发了一种混合特征选择方法,该方法整合了基于相关性的特征选择(CFS)和多目标进化算法(MOEA)方法,用于选择信息丰富的基因。使用径向基函数神经网络(RBFNN)分类器的混合模型通过10折交叉验证测试在11个基准基因表达数据集上进行了评估。
将实验结果与七种基于传统方法的特征选择方法以及文献中的其他方法进行了比较,结果表明我们的方法在分类准确率方面具有明显优势,并且通过与其他方法的广泛比较,我们选择了一些基因。
我们提出的CFS-MOEA算法在11个数据集中的6个数据集上实现了高达100%的分类准确率,且预测基因子集规模最小。