Suppr超能文献

乳腺癌筛查与检测中机器学习算法性能的比较:一项方案。

Comparison of the performance of machine learning algorithms in breast cancer screening and detection: A protocol.

作者信息

Salod Zakia, Singh Yashik

机构信息

Department of TeleHealth, University of KwaZulu-Natal, Durban, South Africa.

出版信息

J Public Health Res. 2019 Dec 4;8(3):1677. doi: 10.4081/jphr.2019.1677.

Abstract

Breast Cancer (BC) is a known global crisis. The World Health Organization reports a global 2.09 million incidences and 627,000 deaths in 2018 relating to BC. The traditional BC screening method in developed countries is mammography, whilst developing countries employ breast self-examination and clinical breast examination. The prominent gold standard for BC detection is triple assessment: i) clinical examination, ii) mammography and/or ultrasonography; and iii) Fine Needle Aspirate Cytology. However, the introduction of cheaper, efficient and noninvasive methods of BC screening and detection would be beneficial. We propose the use of eight machine learning algorithms: i) Logistic Regression; ii) Support Vector Machine; iii) -Nearest Neighbors; iv) Decision Tree; v) Random Forest; vi) Adaptive Boosting; vii) Gradient Boosting; viii) eXtreme Gradient Boosting, and blood test results using BC Coimbra Dataset (BCCD) from University of California Irvine online database to create models for BC prediction. To ensure the models' robustness, we will employ: i) Stratified -fold Cross- Validation; ii) Correlation-based Feature Selection (CFS); and iii) parameter tuning. The models will be validated on validation and test sets of BCCD for full features and reduced features. Feature reduction has an impact on algorithm performance. Seven metrics will be used for model evaluation, including accuracy. The CFS together with highest performing model(s) can serve to identify important specific blood tests that point towards BC, which may serve as an important BC biomarker. Highest performing model(s) may eventually be used to create an Artificial Intelligence tool to assist clinicians in BC screening and detection.

摘要

乳腺癌(BC)是一个众所周知的全球危机。世界卫生组织报告称,2018年全球有209万例乳腺癌发病病例,62.7万人死亡。发达国家传统的乳腺癌筛查方法是乳房X光检查,而发展中国家则采用乳房自我检查和临床乳房检查。乳腺癌检测的突出金标准是三联评估:i)临床检查;ii)乳房X光检查和/或超声检查;iii)细针穿刺细胞学检查。然而,引入更便宜、高效且无创的乳腺癌筛查和检测方法将大有裨益。我们建议使用八种机器学习算法:i)逻辑回归;ii)支持向量机;iii)K近邻;iv)决策树;v)随机森林;vi)自适应增强;vii)梯度增强;viii)极端梯度增强,并利用来自加州大学欧文分校在线数据库的乳腺癌科英布拉数据集(BCCD)的血液检测结果来创建乳腺癌预测模型。为确保模型的稳健性,我们将采用:i)分层K折交叉验证;ii)基于相关性的特征选择(CFS);iii)参数调整。这些模型将在BCCD的验证集和测试集上针对完整特征和简化特征进行验证。特征约简会对算法性能产生影响。将使用七个指标来评估模型,包括准确率。CFS与性能最佳的模型一起可用于识别指向乳腺癌的重要特定血液检测,这可能成为重要的乳腺癌生物标志物。性能最佳的模型最终可能会被用于创建人工智能工具,以协助临床医生进行乳腺癌的筛查和检测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/050f/6902303/6603cf90fdb7/jphr-8-3-1677-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验