推进乳腺癌预测：基于原始数据集和合成数据集对机器学习模型与深度学习多模型集成的比较分析

Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.

作者信息

Ahmed Kazi Arman, Humaira Israt, Khan Ashiqur Rahman, Hasan Md Shamim, Islam Mukitul, Roy Anik, Karim Mehrab, Uddin Mezbah, Mohammad Ashique, Xames Md Doulotuzzaman

机构信息

Department of Industrial and Production Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh.

Department of Biomedical Engineering, Military Institute of Science and Technology, Dhaka, Bangladesh.

出版信息

PLoS One. 2025 Jun 18;20(6):e0326221. doi: 10.1371/journal.pone.0326221. eCollection 2025.

DOI:10.1371/journal.pone.0326221

PMID:40531928

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12176164/

Abstract

Breast cancer is a significant global health concern with rising incidence and mortality rates. Current diagnostic methods face challenges, necessitating improved approaches. This study employs various machine learning (ML) algorithms, including KNN, SVM, ANN, RF, XGBoost, ensemble models, AutoML, and deep learning (DL) techniques, to enhance breast cancer diagnosis. The objective is to compare the efficiency and accuracy of these models using original and synthetic datasets, contributing to the advancement of breast cancer diagnosis. The methodology comprises three phases, each with two stages. In the first stage of each phase, stratified K-fold cross-validation was performed to train and evaluate multiple ML models. The second stage involved DL-based and AutoML-based ensemble strategies to improve prediction accuracy. In the second and third phases, synthetic data generation methods, such as Gaussian Copula and TVAE, were utilized. The KNN model outperformed others on the original dataset, while the AutoML approach using H2OXGBoost using synthetic data also showed high accuracy. These findings underscore the effectiveness of traditional ML models and AutoML in predicting breast cancer. Additionally, the study demonstrated the potential of synthetic data generation methods to improve prediction performance, aiding decision-making in the diagnosis and treatment of breast cancer.

摘要

乳腺癌是一个重大的全球健康问题，其发病率和死亡率不断上升。当前的诊断方法面临挑战，因此需要改进方法。本研究采用了各种机器学习（ML）算法，包括K近邻算法（KNN）、支持向量机（SVM）、人工神经网络（ANN）、随机森林（RF）、极端梯度提升（XGBoost）、集成模型、自动机器学习（AutoML）以及深度学习（DL）技术，以加强乳腺癌的诊断。目的是使用原始数据集和合成数据集比较这些模型的效率和准确性，为乳腺癌诊断的进步做出贡献。该方法包括三个阶段，每个阶段有两个步骤。在每个阶段的第一步中，进行分层K折交叉验证以训练和评估多个ML模型。第二步涉及基于DL和基于AutoML的集成策略，以提高预测准确性。在第二和第三阶段，使用了高斯Copula和变分自编码器（TVAE）等合成数据生成方法。KNN模型在原始数据集上的表现优于其他模型，而使用合成数据的基于H2OXGBoost的AutoML方法也显示出很高的准确性。这些发现强调了传统ML模型和AutoML在预测乳腺癌方面的有效性。此外，该研究证明了合成数据生成方法在提高预测性能方面的潜力，有助于乳腺癌诊断和治疗中的决策制定。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/beeb/12176164/6657f6ba67a2/pone.0326221.g001.jpg

相似文献

Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.推进乳腺癌预测：基于原始数据集和合成数据集对机器学习模型与深度学习多模型集成的比较分析

PLoS One. 2025 Jun 18;20(6):e0326221. doi: 10.1371/journal.pone.0326221. eCollection 2025.

Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population.西太平洋地区心血管疾病风险预测的增强：一种针对马来西亚人群的机器学习方法。

PLoS One. 2025 Jun 17;20(6):e0323949. doi: 10.1371/journal.pone.0323949. eCollection 2025.

The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.机器学习在疾病预测与管理中分析真实世界数据的应用：系统评价

JMIR Med Inform. 2025 Jun 19;13:e68898. doi: 10.2196/68898.

Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.推进呼吸系统疾病诊断：一种基于深度学习和视觉Transformer的方法及新型X射线数据集

Comput Biol Med. 2025 Aug;194:110501. doi: 10.1016/j.compbiomed.2025.110501. Epub 2025 Jun 9.

Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测：基于放射学报告的多中心方法学研究

J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.

Predicting patients' sentiments about medications using artificial intelligence techniques.使用人工智能技术预测患者对药物的看法。

Sci Rep. 2024 Dec 30;14(1):31928. doi: 10.1038/s41598-024-83222-9.

Research on learning achievement classification based on machine learning.基于机器学习的学习成绩分类研究

PLoS One. 2025 Jun 18;20(6):e0325713. doi: 10.1371/journal.pone.0325713. eCollection 2025.

The association of obesity and lipid-related indicators with all-cause and cardiovascular mortality risks in patients with diabetes or prediabetes: a cross-sectional study based on machine learning algorithms.肥胖及血脂相关指标与糖尿病或糖尿病前期患者全因死亡和心血管死亡风险的关联：一项基于机器学习算法的横断面研究

Front Endocrinol (Lausanne). 2025 Jun 2;16:1492082. doi: 10.3389/fendo.2025.1492082. eCollection 2025.

Idiographic Lapse Prediction With State Space Modeling: Algorithm Development and Validation Study.基于状态空间模型的个性化失误预测：算法开发与验证研究

JMIR Form Res. 2025 Jun 3;9:e73265. doi: 10.2196/73265.

Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study.利用机器学习和真实世界数据预测筛查年龄以下个体的早发性结直肠癌：病例对照研究

JMIR Cancer. 2025 Jun 19;11:e64506. doi: 10.2196/64506.

本文引用的文献

Speech Delay and Hearing Rehabilitation Disparities in Children With Hearing Loss.听力损失儿童的言语延迟与听力康复差异

Otolaryngol Head Neck Surg. 2025 Jun;172(6):2098-2104. doi: 10.1002/ohn.1204. Epub 2025 Mar 7.

The BCPM method: decoding breast cancer with machine learning.BCPM 方法：用机器学习解码乳腺癌。

BMC Med Imaging. 2024 Sep 17;24(1):248. doi: 10.1186/s12880-024-01402-5.

Deep learning prediction of pathological complete response, residual cancer burden, and progression-free survival in breast cancer patients.深度学习预测乳腺癌患者的病理完全缓解、残余肿瘤负担和无进展生存期。

PLoS One. 2023 Jan 6;18(1):e0280148. doi: 10.1371/journal.pone.0280148. eCollection 2023.

Measuring re-identification risk using a synthetic estimator to enable data sharing.使用合成估计器衡量重新识别风险，以实现数据共享。

PLoS One. 2022 Jun 17;17(6):e0269097. doi: 10.1371/journal.pone.0269097. eCollection 2022.

Hot Ductility Prediction Model of Cast Steel with Low-Temperature Transformed Structure during Continuous Casting.连铸过程中具有低温转变组织的铸钢热塑性预测模型

Materials (Basel). 2022 May 13;15(10):3513. doi: 10.3390/ma15103513.

Variational Autoencoder for Image-Based Augmentation of Eye-Tracking Data.用于基于图像的眼动追踪数据增强的变分自编码器

J Imaging. 2021 May 3;7(5):83. doi: 10.3390/jimaging7050083.

Predicting sex from retinal fundus photographs using automated deep learning.利用自动化深度学习从眼底照片预测性别。

Sci Rep. 2021 May 13;11(1):10286. doi: 10.1038/s41598-021-89743-x.

Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries.《全球癌症统计数据 2020：全球 185 个国家和地区 36 种癌症的发病率和死亡率估计》。

CA Cancer J Clin. 2021 May;71(3):209-249. doi: 10.3322/caac.21660. Epub 2021 Feb 4.

Generative Adversarial Networks for Robust Breast Cancer Prognosis Prediction with Limited Data Size.用于在数据量有限的情况下进行稳健乳腺癌预后预测的生成对抗网络

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5669-5672. doi: 10.1109/EMBC44109.2020.9175736.

Multi-Modal Classification for Human Breast Cancer Prognosis Prediction: Proposal of Deep-Learning Based Stacked Ensemble Model.基于深度学习的堆叠集成模型在人类乳腺癌预后预测中的多模态分类。

IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):1032-1041. doi: 10.1109/TCBB.2020.3018467. Epub 2022 Apr 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

推进乳腺癌预测：基于原始数据集和合成数据集对机器学习模型与深度学习多模型集成的比较分析

Advancing breast cancer prediction: Comparative analysis of ML models and deep learning-based multi-model ensembles on original and synthetic datasets.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献