Suppr超能文献

一种基于哈里斯鹰算法的新型混合肿瘤特征基因选择算法

A novel hybrid algorithm based on Harris Hawks for tumor feature gene selection.

作者信息

Liu Junjian, Feng Huicong, Tang Yifan, Zhang Lupeng, Qu Chiwen, Zeng Xiaomin, Peng Xiaoning

机构信息

Department of Statistics, Hunan Normal University College of Mathematics and Statistics, Changsha, Hunan, China.

Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Changsha, Hunan, China.

出版信息

PeerJ Comput Sci. 2023 Feb 13;9:e1229. doi: 10.7717/peerj-cs.1229. eCollection 2023.

Abstract

BACKGROUND

Gene expression data are often used to classify cancer genes. In such high-dimensional datasets, however, only a few feature genes are closely related to tumors. Therefore, it is important to accurately select a subset of feature genes with high contributions to cancer classification.

METHODS

In this article, a new three-stage hybrid gene selection method is proposed that combines a variance filter, extremely randomized tree and Harris Hawks (VEH). In the first stage, we evaluated each gene in the dataset through the variance filter and selected the feature genes that meet the variance threshold. In the second stage, we use extremely randomized tree to further eliminate irrelevant genes. Finally, we used the Harris Hawks algorithm to select the gene subset from the previous two stages to obtain the optimal feature gene subset.

RESULTS

We evaluated the proposed method using three different classifiers on eight published microarray gene expression datasets. The results showed a 100% classification accuracy for VEH in gastric cancer, acute lymphoblastic leukemia and ovarian cancer, and an average classification accuracy of 95.33% across a variety of other cancers. Compared with other advanced feature selection algorithms, VEH has obvious advantages when measured by many evaluation criteria.

摘要

背景

基因表达数据常用于癌症基因分类。然而,在这类高维数据集中,只有少数特征基因与肿瘤密切相关。因此,准确选择对癌症分类有高贡献的特征基因子集非常重要。

方法

本文提出了一种新的三阶段混合基因选择方法,该方法结合了方差过滤器、极端随机树和哈里斯鹰算法(VEH)。在第一阶段,我们通过方差过滤器评估数据集中的每个基因,并选择满足方差阈值的特征基因。在第二阶段,我们使用极端随机树进一步消除无关基因。最后,我们使用哈里斯鹰算法从前两个阶段中选择基因子集,以获得最优的特征基因子集。

结果

我们在八个已发表的微阵列基因表达数据集上使用三种不同的分类器对所提出的方法进行了评估。结果显示,VEH在胃癌、急性淋巴细胞白血病和卵巢癌中的分类准确率达到100%,在其他多种癌症中的平均分类准确率为95.33%。与其他先进的特征选择算法相比,在许多评估标准下,VEH都具有明显优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7d4/10280456/a26157cfbc2f/peerj-cs-09-1229-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验