Suppr超能文献

利用具有可变邻域学习的哈里斯鹰优化算法提高基因表达数据分类的特征选择性能。

Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning.

机构信息

College of Mathematics and Statistics, Hunan Normal University, China.

Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab097.

Abstract

Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.

摘要

基因表达谱分析在肿瘤分子的鉴定和分类中发挥了重要作用。在基因表达数据中,只有少数特征基因与肿瘤密切相关。选择高度有区别的特征基因是一项具有挑战性的任务,现有的方法无法有效地解决这个问题。本文提出了一种新的元启发式基因特征提取方法,称为可变邻域学习哈里斯鹰优化算法(VNLHHO)。首先,使用 F 分数对基因表达数据中的基因进行初步选择,以缩小特征基因的选择范围。然后,构建了一种可变邻域学习策略来平衡哈里斯鹰优化的全局探索和局部开发。最后,采用变异操作来增加种群的多样性,以防止算法陷入局部最优。此外,还使用了一种新的激活函数将 VNLHHO 的连续解转换为二进制值,并使用朴素贝叶斯分类器作为适应度函数来选择有助于对二进制和多类癌症的生物组织进行分类的特征基因。在八种肿瘤的基因表达谱数据上进行了实验。结果表明,对于结肠、神经系统和肺部的肿瘤,VNLHHO 的分类准确率大于 96.128%,其余的肿瘤均为 100%。我们比较了其他七种算法,并证明了 VNLHHO 在基因表达数据的特征选择方面的分类准确率、适应度值和 AUC 值方面具有优越性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验