Suppr超能文献

比较监督式和半监督式机器学习模型在乳腺癌诊断中的应用

Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer.

作者信息

Al-Azzam Nosayba, Shatnawi Ibrahem

机构信息

Department of Physiology and Biochemistry, Faculty of Medicine, Jordan University of Science and Technology, Irbid, 22110, Jordan.

Independent Researcher in Data Analytics, Jordan.

出版信息

Ann Med Surg (Lond). 2021 Jan 8;62:53-64. doi: 10.1016/j.amsu.2020.12.043. eCollection 2021 Feb.

Abstract

BACKGROUND

Breast cancer disease is the most common cancer in US women and the second cause of cancer death among women.

OBJECTIVES

To compare and evaluate the performance and accuracy of the key supervised and semi-supervised machine learning algorithms for breast cancer prediction.

MATERIALS AND METHODS

We have used nine machine learning classification algorithms for supervised (SL) and semi-supervised learning (SSL): 1) Logistic regression; 2) Gaussian Naive Bayes; 3) Linear Support vector machine; 4) RBF Support vector machine; 5) Decision Tree; 6) Random Forest; 7) Xgboost; 8) Gradient Boosting; 9) KNN. The Wisconsin Diagnosis Cancer dataset was used to train and test these models. To ensure the robustness of the model, we have applied K-fold cross-validation and optimized hyperparameters. We have evaluated and compared the models using accuracy, precision, recall, F1-score, and ROC curves.

RESULTS

The results of all models are inspiring using both SL and SSL. The SSL has high accuracy (90%-98%) with just half of the training data. The KNN model for the SL and logistic regression for the SSL achieved the highest accuracy of 98.

CONCLUSION

The accuracies of SSL algorithms are very close to the SL algorithms. The accuracies of all models are in the range of 91-98%. SSL is a promising and competitive approach to solve the problem. Using a small sample of labeled and low computational power, the SSL is fully capable of replacing SL algorithms in diagnosing tumor type.

摘要

背景

乳腺癌是美国女性中最常见的癌症,也是女性癌症死亡的第二大原因。

目的

比较和评估用于乳腺癌预测的关键监督式和半监督式机器学习算法的性能和准确性。

材料与方法

我们使用了九种用于监督学习(SL)和半监督学习(SSL)的机器学习分类算法:1)逻辑回归;2)高斯朴素贝叶斯;3)线性支持向量机;4)径向基函数支持向量机;5)决策树;6)随机森林;7)Xgboost;8)梯度提升;9)K近邻。使用威斯康星诊断癌症数据集来训练和测试这些模型。为确保模型的稳健性,我们应用了K折交叉验证并优化了超参数。我们使用准确率、精确率、召回率、F1分数和ROC曲线对模型进行了评估和比较。

结果

使用监督学习和半监督学习时,所有模型的结果都令人鼓舞。半监督学习仅使用一半的训练数据就具有较高的准确率(90%-98%)。监督学习中的K近邻模型和半监督学习中的逻辑回归模型达到了最高准确率98。

结论

半监督学习算法的准确率与监督学习算法非常接近。所有模型的准确率在91%-98%范围内。半监督学习是解决该问题的一种有前途且具有竞争力的方法。半监督学习使用少量标记样本且计算能力较低,完全能够在诊断肿瘤类型方面替代监督学习算法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/309a/7806524/d0f52133dc7d/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验