BIMSSA：利用樽海鞘群优化算法和集成机器学习方法增强癌症预测

BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches.

作者信息

Panda Pinakshi, Bisoy Sukant Kishoro, Panigrahi Amrutanshu, Pati Abhilash, Sahu Bibhuprasad, Guo Zheshan, Liu Haipeng, Jain Prince

机构信息

Department of Computer Science and Engineering, C. V. Raman Global University, Bhubaneswar, Odisha, India.

Department of Computer Science and Engineering, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India.

出版信息

Front Genet. 2025 Jan 6;15:1491602. doi: 10.3389/fgene.2024.1491602. eCollection 2024.

DOI:10.3389/fgene.2024.1491602

PMID:39834551

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11743448/

Abstract

BACKGROUND

Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient's genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential.

METHODS

Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work's ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR were applied to extract relevant gene expression aspects. Then, SSA was implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, were applied as the base learners. Then, majority voting was used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA was evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT).

RESULTS

In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer datasets, according to the empirical evaluations.

CONCLUSION

The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.

摘要

背景

癌症发病率正在迅速上升，导致全球死亡率上升。根据世界卫生组织（WHO）的数据，2020年有990万人死于癌症。机器学习（ML）有助于早期发现癌症，从而减少死亡人数。基于ML的癌症诊断模型可以使用患者的基因信息，如微阵列数据。微阵列数据是高维的，这可能会降低基于ML的模型的性能。因此，特征选择变得至关重要。

方法

群体优化算法（SSA）、改进的最大相关性和最小冗余度（IMRMR）以及Boruta构成了这项工作中基于ML的模型BIMSSA的基础。BIMSSA模型实现了一种流水线式特征选择方法，以有效处理高维微阵列数据。最初，应用Boruta和IMRMR来提取相关的基因表达方面。然后，实施SSA以优化特征大小。为了优化特征空间，将五个独立的机器学习分类器，即支持向量机（SVM）、随机森林（RF）、极限学习机（ELM）、AdaBoost和XGBoost，用作基础学习器。然后，使用多数投票来构建前三种算法的集成。基于集成ML的模型BIMSSA使用来自四种不同癌症类型的微阵列数据进行评估：成人急性淋巴细胞白血病和急性髓细胞白血病（ALL-AML）、淋巴瘤、混合谱系白血病（MLL）和小圆蓝细胞瘤（SRBCT）。

结果

根据实证评估，就准确率而言，所提出的BIMSSA（Boruta + IMRMR + SSA）在ALL-AML癌症数据集上达到了96.7%，在淋巴瘤数据集上达到了96.2%，在MLL数据集上达到了95.1%，在SRBCT癌症数据集上达到了97.1%。

结论

结果表明，所提出的方法可以准确预测不同形式的癌症，这对医生和研究人员都很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f99/11743448/6ff07ba770f7/fgene-15-1491602-g001.jpg

相似文献

BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches.BIMSSA：利用樽海鞘群优化算法和集成机器学习方法增强癌症预测

Front Genet. 2025 Jan 6;15:1491602. doi: 10.3389/fgene.2024.1491602. eCollection 2024.

Prediction and feature selection of low birth weight using machine learning algorithms.利用机器学习算法预测和选择低出生体重。

J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8.

Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm.基于改进的鹽蝽群算法的基因表达数据分类的两阶段特征选择

Math Biosci Eng. 2022 Sep 19;19(12):13747-13781. doi: 10.3934/mbe.2022641.

Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset.用于冠状动脉疾病诊断和预测的具有简化特征子集的异构分类器集成

Comput Methods Programs Biomed. 2021 Jan;198:105770. doi: 10.1016/j.cmpb.2020.105770. Epub 2020 Sep 30.

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.C-HMOSHSSA：使用多目标元启发式和机器学习方法进行癌症分类的基因选择。

Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.

Self-learning salp swarm algorithm for global optimization and its application in multi-layer perceptron model training.用于全局优化的自学习樽海鞘群算法及其在多层感知器模型训练中的应用

Sci Rep. 2024 Nov 9;14(1):27401. doi: 10.1038/s41598-024-77440-4.

An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples.一种基于多重过滤和监督属性聚类算法的集成机器学习模型，用于对癌症样本进行分类。

PeerJ Comput Sci. 2021 Sep 16;7:e671. doi: 10.7717/peerj-cs.671. eCollection 2021.

An Ensemble Feature Selection Approach-Based Machine Learning Classifiers for Prediction of COVID-19 Disease.一种基于集成特征选择方法的机器学习分类器用于预测新冠肺炎疾病

Int J Telemed Appl. 2024 Apr 17;2024:8188904. doi: 10.1155/2024/8188904. eCollection 2024.

A gene selection algorithm for microarray cancer classification using an improved particle swarm optimization.基于改进型粒子群算法的基因选择算法在微阵列癌症分类中的应用

Sci Rep. 2024 Aug 23;14(1):19613. doi: 10.1038/s41598-024-68744-6.

Development of an efficient novel method for coronary artery disease prediction using machine learning and deep learning techniques.利用机器学习和深度学习技术开发一种用于冠心病预测的高效新方法。

Technol Health Care. 2024;32(6):4545-4569. doi: 10.3233/THC-240740.

引用本文的文献

Bearing fault diagnosis based on Kepler algorithm and attention mechanism.

PLoS One. 2025 Sep 4;20(9):e0331128. doi: 10.1371/journal.pone.0331128. eCollection 2025.

A Machine Learning Approach to Differentiate Cold and Hot Syndrome in Viral Pneumonia Integrating Traditional Chinese Medicine and Modern Medicine: Machine Learning Model Development and Validation.一种结合中医与现代医学的机器学习方法用于鉴别病毒性肺炎的寒证与热证：机器学习模型的开发与验证

JMIR Med Inform. 2025 Jul 16;13:e64725. doi: 10.2196/64725.

Evaluating the Nuclear Reaction Optimization (NRO) Algorithm for Gene Selection in Cancer Classification.评估用于癌症分类中基因选择的核反应优化（NRO）算法。

Diagnostics (Basel). 2025 Apr 3;15(7):927. doi: 10.3390/diagnostics15070927.

本文引用的文献

Prognostic model development for classification of colorectal adenocarcinoma by using machine learning model based on feature selection technique boruta.基于特征选择技术 Boruta 的机器学习模型用于结直肠腺癌分类的预后模型开发。

Sci Rep. 2023 Apr 19;13(1):6413. doi: 10.1038/s41598-023-33327-4.

A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data.一种混合机器学习方法，用于从基因表达微阵列数据中筛选原发性乳腺肿瘤分类的最佳预测因子。

Diagnostics (Basel). 2023 Feb 13;13(4):708. doi: 10.3390/diagnostics13040708.

Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data.受自然启发的元启发式模型用于生物医学微阵列数据的基因选择和分类。

Med Biol Eng Comput. 2022 Jun;60(6):1627-1646. doi: 10.1007/s11517-022-02555-7. Epub 2022 Apr 11.

Gene selection for microarray data classification via multi-objective graph theoretic-based method.基于多目标图论方法的微阵列数据分类基因选择

Artif Intell Med. 2022 Jan;123:102228. doi: 10.1016/j.artmed.2021.102228. Epub 2021 Dec 3.

A novel bio-inspired hybrid multi-filter wrapper gene selection method with ensemble classifier for microarray data.一种用于微阵列数据的、基于集成分类器的新型生物启发式混合多滤波器包装基因选择方法。

Neural Comput Appl. 2023;35(16):11531-11561. doi: 10.1007/s00521-021-06459-9. Epub 2021 Sep 12.

Lymphoma Classification.淋巴瘤分类。

Cancer J. 2020 May/Jun;26(3):176-185. doi: 10.1097/PPO.0000000000000451.

Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data.使用极端梯度提升算法和多组学数据对癌症进行诊断分类

Comput Biol Med. 2020 Jun;121:103761. doi: 10.1016/j.compbiomed.2020.103761. Epub 2020 Apr 16.

Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data.基于基因表达数据的癌症生存预测的卷积神经网络迁移学习。

PLoS One. 2020 Mar 26;15(3):e0230536. doi: 10.1371/journal.pone.0230536. eCollection 2020.

Lymphoma: Diagnosis and Treatment.淋巴瘤：诊断与治疗。

Am Fam Physician. 2020 Jan 1;101(1):34-41.

Minimum redundancy feature selection from microarray gene expression data.从微阵列基因表达数据中进行最小冗余特征选择。

J Bioinform Comput Biol. 2005 Apr;3(2):185-205. doi: 10.1142/s0219720005001004.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

BIMSSA：利用樽海鞘群优化算法和集成机器学习方法增强癌症预测

BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献