比较监督式和半监督式机器学习模型在乳腺癌诊断中的应用

Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer.

作者信息

Al-Azzam Nosayba, Shatnawi Ibrahem

机构信息

Department of Physiology and Biochemistry, Faculty of Medicine, Jordan University of Science and Technology, Irbid, 22110, Jordan.

Independent Researcher in Data Analytics, Jordan.

出版信息

Ann Med Surg (Lond). 2021 Jan 8;62:53-64. doi: 10.1016/j.amsu.2020.12.043. eCollection 2021 Feb.

DOI:10.1016/j.amsu.2020.12.043

PMID:33489117

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7806524/

Abstract

BACKGROUND

Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women.

OBJECTIVES

To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction.

MATERIALS AND METHODS

We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves.

RESULTS

The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%-98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98.

CONCLUSION

The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91-98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type.

摘要

背景

乳腺癌是美国女性中最常见的癌症，也是女性癌症死亡的第二大原因。

目的

比较和评估用于乳腺癌预测的关键监督式和半监督式机器学习算法的性能和准确性。

材料与方法

我们使用了九种用于监督学习（SL）和半监督学习（SSL）的机器学习分类算法：1）逻辑回归；2）高斯朴素贝叶斯；3）线性支持向量机；4）径向基函数支持向量机；5）决策树；6）随机森林；7）Xgboost；8）梯度提升；9）K近邻。使用威斯康星诊断癌症数据集来训练和测试这些模型。为确保模型的稳健性，我们应用了K折交叉验证并优化了超参数。我们使用准确率、精确率、召回率、F1分数和ROC曲线对模型进行了评估和比较。

结果

使用监督学习和半监督学习时，所有模型的结果都令人鼓舞。半监督学习仅使用一半的训练数据就具有较高的准确率（90%-98%）。监督学习中的K近邻模型和半监督学习中的逻辑回归模型达到了最高准确率98。

结论

半监督学习算法的准确率与监督学习算法非常接近。所有模型的准确率在91%-98%范围内。半监督学习是解决该问题的一种有前途且具有竞争力的方法。半监督学习使用少量标记样本且计算能力较低，完全能够在诊断肿瘤类型方面替代监督学习算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/309a/7806524/d0f52133dc7d/gr1.jpg

相似文献

Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer.比较监督式和半监督式机器学习模型在乳腺癌诊断中的应用

Ann Med Surg (Lond). 2021 Jan 8;62:53-64. doi: 10.1016/j.amsu.2020.12.043. eCollection 2021 Feb.

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.我们是否需要不同的机器学习算法来进行定量构效关系建模？对 16 种机器学习算法在 14 个定量构效关系数据集上的综合评估。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.

Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors.基于 DNA 甲基化的中枢神经系统肿瘤有监督分类的半监督学习综合研究。

BMC Bioinformatics. 2022 Jun 8;23(1):223. doi: 10.1186/s12859-022-04764-1.

Prediction and Diagnosis of Breast Cancer Using Machine and Modern Deep Learning Models.使用机器和现代深度学习模型预测和诊断乳腺癌。

Asian Pac J Cancer Prev. 2024 Mar 1;25(3):1077-1085. doi: 10.31557/APJCP.2024.25.3.1077.

Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021.2021 年，埃塞俄比亚东北部阿法尔地区使用监督机器学习算法对 2 型糖尿病疾病状况进行分类和预测。

Sci Rep. 2023 May 13;13(1):7779. doi: 10.1038/s41598-023-34906-1.

Comparison of Classification Success Rates of Different Machine Learning Algorithms in the Diagnosis of Breast Cancer.不同机器学习算法在乳腺癌诊断中的分类成功率比较。

Asian Pac J Cancer Prev. 2022 Oct 1;23(10):3287-3297. doi: 10.31557/APJCP.2022.23.10.3287.

[Construction of a predictive model for in-hospital mortality of sepsis patients in intensive care unit based on machine learning].基于机器学习构建重症监护病房脓毒症患者院内死亡率预测模型

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2023 Jul;35(7):696-701. doi: 10.3760/cma.j.cn121430-20221219-01104.

Application of supervised machine learning algorithms in the classification of sagittal gait patterns of cerebral palsy children with spastic diplegia.监督机器学习算法在痉挛性双瘫脑瘫儿童矢状面步态模式分类中的应用。

Comput Biol Med. 2019 Mar;106:33-39. doi: 10.1016/j.compbiomed.2019.01.009. Epub 2019 Jan 16.

Which supervised machine learning algorithm can best predict achievement of minimum clinically important difference in neck pain after surgery in patients with cervical myelopathy? A QOD study.哪种监督机器学习算法最能预测颈椎脊髓病患者手术后颈部疼痛达到最小临床重要差异？一项 QOD 研究。

Neurosurg Focus. 2023 Jun;54(6):E5. doi: 10.3171/2023.3.FOCUS2372.

Applications of Machine Learning to Diagnosis of Parkinson's Disease.机器学习在帕金森病诊断中的应用。

Brain Sci. 2023 Nov 3;13(11):1546. doi: 10.3390/brainsci13111546.

引用本文的文献

Linear B-cell epitope prediction for SARS and COVID-19 vaccine design: Integrating balanced ensemble learning models and resampling strategies.用于SARS和COVID-19疫苗设计的线性B细胞表位预测：集成平衡集成学习模型和重采样策略

PeerJ Comput Sci. 2025 Jun 18;11:e2970. doi: 10.7717/peerj-cs.2970. eCollection 2025.

Enhancing breast cancer prediction through stacking ensemble and deep learning integration.通过堆叠集成和深度学习集成增强乳腺癌预测

PeerJ Comput Sci. 2025 Feb 3;11:e2461. doi: 10.7717/peerj-cs.2461. eCollection 2025.

Artificial intelligence in breast cancer survival prediction: a comprehensive systematic review and meta-analysis.人工智能在乳腺癌生存预测中的应用：一项全面的系统评价和荟萃分析。

Front Oncol. 2025 Jan 7;14:1420328. doi: 10.3389/fonc.2024.1420328. eCollection 2024.

Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets.使用深度生成对抗网络的合成增强重采样：一种从不平衡数据集中改善癌症预测的新方法。

Cancers (Basel). 2024 Dec 2;16(23):4046. doi: 10.3390/cancers16234046.

Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis.从不平衡数据中学习：先进重采样技术与机器学习模型的整合用于增强癌症诊断与预后

Cancers (Basel). 2024 Oct 8;16(19):3417. doi: 10.3390/cancers16193417.

Assessment of beliefs and attitudes towards benzodiazepines using machine learning based on social media posts: an observational study.基于社交媒体帖子的机器学习评估苯二氮䓬类药物的信念和态度：一项观察性研究。

BMC Psychiatry. 2024 Oct 8;24(1):659. doi: 10.1186/s12888-024-06111-5.

Research on ultrasound-based radiomics: a bibliometric analysis.基于超声的放射组学研究：一项文献计量分析。

Quant Imaging Med Surg. 2024 Jul 1;14(7):4520-4539. doi: 10.21037/qims-23-1867. Epub 2024 Jun 18.

Breast Cancer Prediction Based on Multiple Machine Learning Algorithms.基于多种机器学习算法的乳腺癌预测。

Technol Cancer Res Treat. 2024 Jan-Dec;23:15330338241234791. doi: 10.1177/15330338241234791.

Breast Cancer Detection and Prevention Using Machine Learning.利用机器学习进行乳腺癌检测与预防

Diagnostics (Basel). 2023 Oct 2;13(19):3113. doi: 10.3390/diagnostics13193113.

Comparison of Optimal Machine Learning Algorithms for Early Detection of Unknown Hazardous Chemicals in Rivers Using Sensor Monitoring Data.利用传感器监测数据对河流中未知有害化学物质进行早期检测的最优机器学习算法比较

Toxics. 2023 Mar 27;11(4):314. doi: 10.3390/toxics11040314.

本文引用的文献

STROCSS 2019 Guideline: Strengthening the reporting of cohort studies in surgery.STROCSS 2019 指南：加强外科学队列研究报告。

Int J Surg. 2019 Dec;72:156-165. doi: 10.1016/j.ijsu.2019.11.002. Epub 2019 Nov 6.

Breast cancer statistics, 2019.乳腺癌统计数据，2019 年。

CA Cancer J Clin. 2019 Nov;69(6):438-451. doi: 10.3322/caac.21583. Epub 2019 Oct 2.

Breast cancer.乳腺癌。

Nat Rev Dis Primers. 2019 Sep 23;5(1):66. doi: 10.1038/s41572-019-0111-2.

Diagnostic Method of Diabetes Based on Support Vector Machine and Tongue Images.基于支持向量机和舌象的糖尿病诊断方法

Biomed Res Int. 2017;2017:7961494. doi: 10.1155/2017/7961494. Epub 2017 Jan 4.

An immune-inspired semi-supervised algorithm for breast cancer diagnosis.一种受免疫启发的用于乳腺癌诊断的半监督算法。

Comput Methods Programs Biomed. 2016 Oct;134:259-65. doi: 10.1016/j.cmpb.2016.07.020. Epub 2016 Jul 9.

A historic and scientific review of breast cancer: The next global healthcare challenge.乳腺癌的历史与科学回顾：下一个全球医疗挑战。

Int J Gynaecol Obstet. 2015 Oct;131 Suppl 1:S36-9. doi: 10.1016/j.ijgo.2015.03.015.

Machine learning applications in cancer prognosis and prediction.机器学习在癌症预后和预测中的应用。

Comput Struct Biotechnol J. 2014 Nov 15;13:8-17. doi: 10.1016/j.csbj.2014.11.005. eCollection 2015.

Semi-supervised learning improves gene expression-based prediction of cancer recurrence.半监督学习提高了基于基因表达的癌症复发预测。

Bioinformatics. 2011 Nov 1;27(21):3017-23. doi: 10.1093/bioinformatics/btr502. Epub 2011 Sep 4.

Applications of machine learning in cancer prediction and prognosis.机器学习在癌症预测和预后中的应用。

Cancer Inform. 2007 Feb 11;2:59-77.

Diffusion-weighted magnetic resonance imaging as a cancer biomarker: consensus and recommendations.磁共振扩散加权成像作为一种癌症生物标志物：共识与建议

Neoplasia. 2009 Feb;11(2):102-25. doi: 10.1593/neo.81328.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

比较监督式和半监督式机器学习模型在乳腺癌诊断中的应用

Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVES

MATERIALS AND METHODS

RESULTS

CONCLUSION

背景

目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献