Suppr超能文献

一种使用基因表达数据进行癌症分类的集成特征选择算法

An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data.

作者信息

Ahmed Saeed, Kabir Muhammad, Ali Zakir, Arif Muhammad, Ali Farman, Yu Dong-Jun

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.

出版信息

Comb Chem High Throughput Screen. 2018;21(9):631-645. doi: 10.2174/1386207322666181220124756.

Abstract

AIM AND OBJECTIVE

Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance.

MATERIALS AND METHODS

In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test.

RESULTS

The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods.

CONCLUSION

Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.

摘要

目的

癌症是一种在全球范围内具有危险性的疾病,由基因组中的体细胞突变引起。在早期阶段诊断这种致命疾病是微阵列数据一项全新的临床应用。在DNA微阵列技术中,基因表达数据具有高维度和小样本量的特点。因此,开发高效且强大的特征选择方法不可或缺,这些方法能够识别一小部分基因以实现更好的分类性能。

材料与方法

在本研究中,我们开发了一种混合特征选择方法,该方法整合了基于相关性的特征选择(CFS)和多目标进化算法(MOEA)方法,用于选择信息丰富的基因。使用径向基函数神经网络(RBFNN)分类器的混合模型通过10折交叉验证测试在11个基准基因表达数据集上进行了评估。

结果

将实验结果与七种基于传统方法的特征选择方法以及文献中的其他方法进行了比较,结果表明我们的方法在分类准确率方面具有明显优势,并且通过与其他方法的广泛比较,我们选择了一些基因。

结论

我们提出的CFS-MOEA算法在11个数据集中的6个数据集上实现了高达100%的分类准确率,且预测基因子集规模最小。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验