Suppr超能文献

BIMSSA:利用樽海鞘群优化算法和集成机器学习方法增强癌症预测

BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches.

作者信息

Panda Pinakshi, Bisoy Sukant Kishoro, Panigrahi Amrutanshu, Pati Abhilash, Sahu Bibhuprasad, Guo Zheshan, Liu Haipeng, Jain Prince

机构信息

Department of Computer Science and Engineering, C. V. Raman Global University, Bhubaneswar, Odisha, India.

Department of Computer Science and Engineering, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India.

出版信息

Front Genet. 2025 Jan 6;15:1491602. doi: 10.3389/fgene.2024.1491602. eCollection 2024.

Abstract

BACKGROUND

Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient's genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential.

METHODS

Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work's ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR were applied to extract relevant gene expression aspects. Then, SSA was implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, were applied as the base learners. Then, majority voting was used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA was evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT).

RESULTS

In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer datasets, according to the empirical evaluations.

CONCLUSION

The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.

摘要

背景

癌症发病率正在迅速上升,导致全球死亡率上升。根据世界卫生组织(WHO)的数据,2020年有990万人死于癌症。机器学习(ML)有助于早期发现癌症,从而减少死亡人数。基于ML的癌症诊断模型可以使用患者的基因信息,如微阵列数据。微阵列数据是高维的,这可能会降低基于ML的模型的性能。因此,特征选择变得至关重要。

方法

群体优化算法(SSA)、改进的最大相关性和最小冗余度(IMRMR)以及Boruta构成了这项工作中基于ML的模型BIMSSA的基础。BIMSSA模型实现了一种流水线式特征选择方法,以有效处理高维微阵列数据。最初,应用Boruta和IMRMR来提取相关的基因表达方面。然后,实施SSA以优化特征大小。为了优化特征空间,将五个独立的机器学习分类器,即支持向量机(SVM)、随机森林(RF)、极限学习机(ELM)、AdaBoost和XGBoost,用作基础学习器。然后,使用多数投票来构建前三种算法的集成。基于集成ML的模型BIMSSA使用来自四种不同癌症类型的微阵列数据进行评估:成人急性淋巴细胞白血病和急性髓细胞白血病(ALL-AML)、淋巴瘤、混合谱系白血病(MLL)和小圆蓝细胞瘤(SRBCT)。

结果

根据实证评估,就准确率而言,所提出的BIMSSA(Boruta + IMRMR + SSA)在ALL-AML癌症数据集上达到了96.7%,在淋巴瘤数据集上达到了96.2%,在MLL数据集上达到了95.1%,在SRBCT癌症数据集上达到了97.1%。

结论

结果表明,所提出的方法可以准确预测不同形式的癌症,这对医生和研究人员都很有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f99/11743448/6ff07ba770f7/fgene-15-1491602-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验