• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于癌症诊断的高判别混合特征选择算法。

A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis.

机构信息

Information Systems Department, Suez Canal University, Ismailia 41522, Egypt.

出版信息

ScientificWorldJournal. 2022 Aug 9;2022:1056490. doi: 10.1155/2022/1056490. eCollection 2022.

DOI:10.1155/2022/1056490
PMID:35983572
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9381276/
Abstract

Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, -statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and -nearest neighbor (NN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy.

摘要

癌症是一种致命的疾病,是由于细胞的快速和不受控制的生长引起的。在本文中,提出了一种机器学习(ML)算法,用于从大数据中诊断不同的癌症疾病。该算法包括两阶段混合特征选择。在第一阶段,启动一个总体排名器,以组合三种基于过滤的特征评估方法(卡方检验、-统计量和互信息(MI))的结果。然后根据此组合对特征进行排序。在第二阶段,使用基于包装的顺序前向选择来发现最优特征子集,使用 ML 模型,如支持向量机(SVM)、决策树(DT)、随机森林(RF)和 -最近邻(NN)分类器。为了检验所提出的算法,在四个癌症微阵列数据集上进行了多次测试,在此过程中使用了 10 倍交叉验证和超参数调整。通过计算诊断准确性来评估算法的性能。结果表明,对于白血病数据集,SVM 和 KNN 模型在仅使用 5 个特征时的准确率最高,达到 100%。对于卵巢癌数据集,SVM 模型在仅使用 6 个特征时的准确率最高,达到 100%。对于小圆形蓝色细胞肿瘤(SRBCT)数据集,SVM 模型在仅使用 8 个特征时的准确率也最高,达到 100%。对于肺癌数据集,SVM 模型在使用 19 个特征时的准确率也最高,达到 99.57%。通过与其他算法进行比较,所提出算法在所选特征数量和诊断准确性方面的结果更为优越。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/58fd774f3c8d/TSWJ2022-1056490.alg.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/10ac86bb8783/TSWJ2022-1056490.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/d1e1b9d2d526/TSWJ2022-1056490.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/be9f10a4f04a/TSWJ2022-1056490.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/635113a5de0f/TSWJ2022-1056490.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/58fd774f3c8d/TSWJ2022-1056490.alg.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/10ac86bb8783/TSWJ2022-1056490.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/d1e1b9d2d526/TSWJ2022-1056490.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/be9f10a4f04a/TSWJ2022-1056490.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/635113a5de0f/TSWJ2022-1056490.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/58fd774f3c8d/TSWJ2022-1056490.alg.002.jpg

相似文献

1
A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis.一种用于癌症诊断的高判别混合特征选择算法。
ScientificWorldJournal. 2022 Aug 9;2022:1056490. doi: 10.1155/2022/1056490. eCollection 2022.
2
A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification.三阶段包装器-过滤器特征选择框架用于疾病分类。
Sensors (Basel). 2021 Aug 18;21(16):5571. doi: 10.3390/s21165571.
3
Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark.基于 Apache Spark 的混合机器学习预测慢性肾脏病。
Comput Intell Neurosci. 2022 Feb 23;2022:9898831. doi: 10.1155/2022/9898831. eCollection 2022.
4
A comparative analysis of feature selection models for spatial analysis of floods using hybrid metaheuristic and machine learning models.使用混合元启发式算法和机器学习模型进行洪水空间分析的特征选择模型的比较分析
Environ Sci Pollut Res Int. 2024 May;31(23):33495-33514. doi: 10.1007/s11356-024-33389-5. Epub 2024 Apr 29.
5
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。
BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.
6
Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.用于冠状动脉疾病诊断和预测的具有简化特征子集的异构分类器集成
Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.
7
Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.机器学习混合模型预测慢性肾脏病。
Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.
8
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.机器学习中特征选择的最佳评分对及其在癌症预后预测中的应用。
BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375.
9
Wrapper method for feature selection to classify cardiac arrhythmia.用于心律失常分类的特征选择包装方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:3656-3659. doi: 10.1109/EMBC.2017.8037650.
10
Estimation of non-alcoholic steatohepatitis (NASH) disease using clinical information based on the optimal combination of intelligent algorithms for feature selection and classification.基于智能算法最优组合的特征选择和分类,利用临床信息对非酒精性脂肪性肝炎(NASH)进行评估。
Comput Methods Biomech Biomed Engin. 2024 Jun;27(8):964-979. doi: 10.1080/10255842.2023.2217978. Epub 2023 May 31.

引用本文的文献

1
A hybrid approach with metaheuristic optimization and random forest in improving heart disease prediction.一种结合元启发式优化和随机森林的混合方法用于改善心脏病预测。
Sci Rep. 2025 Mar 31;15(1):10971. doi: 10.1038/s41598-024-73867-x.
2
Enhancing Cancerous Gene Selection and Classification for High-Dimensional Microarray Data Using a Novel Hybrid Filter and Differential Evolutionary Feature Selection.使用新型混合滤波器和差分进化特征选择增强高维微阵列数据的癌基因选择和分类
Cancers (Basel). 2024 Nov 22;16(23):3913. doi: 10.3390/cancers16233913.

本文引用的文献

1
A robust and stable gene selection algorithm based on graph theory and machine learning.基于图论和机器学习的稳健稳定基因选择算法。
Hum Genomics. 2021 Nov 9;15(1):66. doi: 10.1186/s40246-021-00366-9.
2
An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data.一种用于利用正未标记数据预测潜在miRNA与疾病关联的集成学习框架。
Comput Biol Chem. 2021 Dec;95:107566. doi: 10.1016/j.compbiolchem.2021.107566. Epub 2021 Aug 24.
3
A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities.
用于优化文本分类的新兴特征选择优化方法的系统综述:现状与潜在机遇
Neural Comput Appl. 2021;33(22):15091-15118. doi: 10.1007/s00521-021-06406-8. Epub 2021 Aug 13.
4
Optimizing ANFIS using simulated annealing algorithm for classification of microarray gene expression cancer data.使用模拟退火算法优化自适应神经模糊推理系统进行微阵列基因表达癌症数据分类。
Med Biol Eng Comput. 2021 Mar;59(3):497-509. doi: 10.1007/s11517-021-02331-z. Epub 2021 Feb 5.
5
Multi-step ahead meningitis case forecasting based on decomposition and multi-objective optimization methods.基于分解和多目标优化方法的脑膜炎多步超前预测。
J Biomed Inform. 2020 Nov;111:103575. doi: 10.1016/j.jbi.2020.103575. Epub 2020 Sep 22.
6
G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays.G-Forest:一种用于基因表达微阵列中成本敏感特征选择的集成方法。
Artif Intell Med. 2020 Aug;108:101941. doi: 10.1016/j.artmed.2020.101941. Epub 2020 Aug 14.
7
Forecasting Brazilian and American COVID-19 cases based on artificial intelligence coupled with climatic exogenous variables.基于人工智能并结合气候外部变量预测巴西和美国的新冠疫情病例。
Chaos Solitons Fractals. 2020 Oct;139:110027. doi: 10.1016/j.chaos.2020.110027. Epub 2020 Jun 30.
8
Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network.使用基于MapReduce的概率神经网络分析高维基因组数据。
Comput Methods Programs Biomed. 2020 Oct;195:105625. doi: 10.1016/j.cmpb.2020.105625. Epub 2020 Jun 27.
9
Feature Selection for Microarray Data Classification Using Hybrid Information Gain and a Modified Binary Krill Herd Algorithm.基于混合信息增益和改进二进制 Krill Herd 算法的微阵列数据分类特征选择。
Interdiscip Sci. 2020 Sep;12(3):288-301. doi: 10.1007/s12539-020-00372-w. Epub 2020 May 21.
10
A new feature selection algorithm based on relevance, redundancy and complementarity.一种基于相关性、冗余性和互补性的新特征选择算法。
Comput Biol Med. 2020 Apr;119:103667. doi: 10.1016/j.compbiomed.2020.103667. Epub 2020 Feb 19.