Suppr超能文献

一种用于癌症诊断的高判别混合特征选择算法。

A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis.

机构信息

Information Systems Department, Suez Canal University, Ismailia 41522, Egypt.

出版信息

ScientificWorldJournal. 2022 Aug 9;2022:1056490. doi: 10.1155/2022/1056490. eCollection 2022.

Abstract

Cancer is a deadly disease that occurs due to rapid and uncontrolled cell growth. In this article, a machine learning (ML) algorithm is proposed to diagnose different cancer diseases from big data. The algorithm comprises a two-stage hybrid feature selection. In the first stage, an overall ranker is initiated to combine the results of three filter-based feature evaluation methods, namely, chi-squared, -statistic, and mutual information (MI). The features are then ordered according to this combination. In the second stage, the modified wrapper-based sequential forward selection is utilized to discover the optimal feature subset, using ML models such as support vector machine (SVM), decision tree (DT), random forest (RF), and -nearest neighbor (NN) classifiers. To examine the proposed algorithm, many tests have been carried out on four cancerous microarray datasets, employing in the process 10-fold cross-validation and hyperparameter tuning. The performance of the algorithm is evaluated by calculating the diagnostic accuracy. The results indicate that for the leukemia dataset, both SVM and KNN models register the highest accuracy at 100% using only 5 features. For the ovarian cancer dataset, the SVM model achieves the highest accuracy at 100% using only 6 features. For the small round blue cell tumor (SRBCT) dataset, the SVM model also achieves the highest accuracy at 100% using only 8 features. For the lung cancer dataset, the SVM model also achieves the highest accuracy at 99.57% using 19 features. By comparing with other algorithms, the results obtained from the proposed algorithm are superior in terms of the number of selected features and diagnostic accuracy.

摘要

癌症是一种致命的疾病,是由于细胞的快速和不受控制的生长引起的。在本文中,提出了一种机器学习(ML)算法,用于从大数据中诊断不同的癌症疾病。该算法包括两阶段混合特征选择。在第一阶段,启动一个总体排名器,以组合三种基于过滤的特征评估方法(卡方检验、-统计量和互信息(MI))的结果。然后根据此组合对特征进行排序。在第二阶段,使用基于包装的顺序前向选择来发现最优特征子集,使用 ML 模型,如支持向量机(SVM)、决策树(DT)、随机森林(RF)和 -最近邻(NN)分类器。为了检验所提出的算法,在四个癌症微阵列数据集上进行了多次测试,在此过程中使用了 10 倍交叉验证和超参数调整。通过计算诊断准确性来评估算法的性能。结果表明,对于白血病数据集,SVM 和 KNN 模型在仅使用 5 个特征时的准确率最高,达到 100%。对于卵巢癌数据集,SVM 模型在仅使用 6 个特征时的准确率最高,达到 100%。对于小圆形蓝色细胞肿瘤(SRBCT)数据集,SVM 模型在仅使用 8 个特征时的准确率也最高,达到 100%。对于肺癌数据集,SVM 模型在使用 19 个特征时的准确率也最高,达到 99.57%。通过与其他算法进行比较,所提出算法在所选特征数量和诊断准确性方面的结果更为优越。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2132/9381276/10ac86bb8783/TSWJ2022-1056490.001.jpg

相似文献

1
A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis.
ScientificWorldJournal. 2022 Aug 9;2022:1056490. doi: 10.1155/2022/1056490. eCollection 2022.
2
A Tri-Stage Wrapper-Filter Feature Selection Framework for Disease Classification.
Sensors (Basel). 2021 Aug 18;21(16):5571. doi: 10.3390/s21165571.
3
Predicting Chronic Kidney Disease Using Hybrid Machine Learning Based on Apache Spark.
Comput Intell Neurosci. 2022 Feb 23;2022:9898831. doi: 10.1155/2022/9898831. eCollection 2022.
4
A comparative analysis of feature selection models for spatial analysis of floods using hybrid metaheuristic and machine learning models.
Environ Sci Pollut Res Int. 2024 May;31(23):33495-33514. doi: 10.1007/s11356-024-33389-5. Epub 2024 Apr 29.
5
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.
BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.
6
Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.
Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.
7
Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.
Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.
8
9
Wrapper method for feature selection to classify cardiac arrhythmia.
Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:3656-3659. doi: 10.1109/EMBC.2017.8037650.

本文引用的文献

1
A robust and stable gene selection algorithm based on graph theory and machine learning.
Hum Genomics. 2021 Nov 9;15(1):66. doi: 10.1186/s40246-021-00366-9.
2
An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data.
Comput Biol Chem. 2021 Dec;95:107566. doi: 10.1016/j.compbiolchem.2021.107566. Epub 2021 Aug 24.
4
Optimizing ANFIS using simulated annealing algorithm for classification of microarray gene expression cancer data.
Med Biol Eng Comput. 2021 Mar;59(3):497-509. doi: 10.1007/s11517-021-02331-z. Epub 2021 Feb 5.
5
Multi-step ahead meningitis case forecasting based on decomposition and multi-objective optimization methods.
J Biomed Inform. 2020 Nov;111:103575. doi: 10.1016/j.jbi.2020.103575. Epub 2020 Sep 22.
6
G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays.
Artif Intell Med. 2020 Aug;108:101941. doi: 10.1016/j.artmed.2020.101941. Epub 2020 Aug 14.
7
Forecasting Brazilian and American COVID-19 cases based on artificial intelligence coupled with climatic exogenous variables.
Chaos Solitons Fractals. 2020 Oct;139:110027. doi: 10.1016/j.chaos.2020.110027. Epub 2020 Jun 30.
8
Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network.
Comput Methods Programs Biomed. 2020 Oct;195:105625. doi: 10.1016/j.cmpb.2020.105625. Epub 2020 Jun 27.
9
Feature Selection for Microarray Data Classification Using Hybrid Information Gain and a Modified Binary Krill Herd Algorithm.
Interdiscip Sci. 2020 Sep;12(3):288-301. doi: 10.1007/s12539-020-00372-w. Epub 2020 May 21.
10
A new feature selection algorithm based on relevance, redundancy and complementarity.
Comput Biol Med. 2020 Apr;119:103667. doi: 10.1016/j.compbiomed.2020.103667. Epub 2020 Feb 19.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验