Suppr超能文献

推进乳腺癌预测:基于原始数据集和合成数据集对机器学习模型与深度学习多模型集成的比较分析

Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.

作者信息

Ahmed Kazi Arman, Humaira Israt, Khan Ashiqur Rahman, Hasan Md Shamim, Islam Mukitul, Roy Anik, Karim Mehrab, Uddin Mezbah, Mohammad Ashique, Xames Md Doulotuzzaman

机构信息

Department of Industrial and Production Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh.

Department of Biomedical Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh.

出版信息

PLoS One. 2025 Jun 18;20(6):e0326221. doi: 10.1371/journal.pone.0326221. eCollection 2025.

Abstract

Breast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, and deep learning (DL) techniques, to enhance breast cancer diagnosis. The objective is to compare the efficiency and accuracy of these models using original and synthetic datasets, contributing to the advancement of breast cancer diagnosis. The methodology comprises three phases, each with two stages. In the first stage of each phase, stratified K-fold cross-validation was performed to train and evaluate multiple ML models. The second stage involved DL-based and AutoML-based ensemble strategies to improve prediction accuracy. In the second and third phases, synthetic data generation methods, such as Gaussian Copula and TVAE, were utilized. The KNN model outperformed others on the original dataset, while the AutoML approach using H2OXGBoost using synthetic data also showed high accuracy. These findings underscore the effectiveness of traditional ML models and AutoML in predicting breast cancer. Additionally, the study demonstrated the potential of synthetic data generation methods to improve prediction performance, aiding decision-making in the diagnosis and treatment of breast cancer.

摘要

乳腺癌是一个重大的全球健康问题,其发病率和死亡率不断上升。当前的诊断方法面临挑战,因此需要改进方法。本研究采用了各种机器学习(ML)算法,包括K近邻算法(KNN)、支持向量机(SVM)、人工神经网络(ANN)、随机森林(RF)、极端梯度提升(XGBoost)、集成模型、自动机器学习(AutoML)以及深度学习(DL)技术,以加强乳腺癌的诊断。目的是使用原始数据集和合成数据集比较这些模型的效率和准确性,为乳腺癌诊断的进步做出贡献。该方法包括三个阶段,每个阶段有两个步骤。在每个阶段的第一步中,进行分层K折交叉验证以训练和评估多个ML模型。第二步涉及基于DL和基于AutoML的集成策略,以提高预测准确性。在第二和第三阶段,使用了高斯Copula和变分自编码器(TVAE)等合成数据生成方法。KNN模型在原始数据集上的表现优于其他模型,而使用合成数据的基于H2OXGBoost的AutoML方法也显示出很高的准确性。这些发现强调了传统ML模型和AutoML在预测乳腺癌方面的有效性。此外,该研究证明了合成数据生成方法在提高预测性能方面的潜力,有助于乳腺癌诊断和治疗中的决策制定。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/beeb/12176164/6657f6ba67a2/pone.0326221.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验