使用机器学习方法预测乳腺癌。

Prediction of Breast Cancer using Machine Learning Approaches.

作者信息

Rabiei Reza, Ayyoubzadeh Seyed Mohammad, Sohrabei Solmaz, Esmaeili Marzieh, Atashi Alireza

机构信息

PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Tehran University of Medical Science, Tehran, Iran.

出版信息

J Biomed Phys Eng. 2022 Jun 1;12(3):297-308. doi: 10.31661/jbpe.v0i0.2109-1403. eCollection 2022 Jun.

Abstract

BACKGROUND

Breast cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social, and economic factors. Machine learning has the potential to predict breast cancer based on features hidden in data.

OBJECTIVE

This study aimed to predict breast cancer using different machine-learning approaches applying demographic, laboratory, and mammographic data.

MATERIAL AND METHODS

In this analytical study, the database, including 5,178 independent records, 25% of which belonged to breast cancer patients with 24 attributes in each record was obtained from Motamed cancer institute (ACECR), Tehran, Iran. The database contained 5,178 independent records, 25% of which belonged to breast cancer patients containing 24 attributes in each record. The random forest (RF), neural network (MLP), gradient boosting trees (GBT), and genetic algorithms (GA) were used in this study. Models were initially trained with demographic and laboratory features (20 features). The models were then trained with all demographic, laboratory, and mammographic features (24 features) to measure the effectiveness of mammography features in predicting breast cancer.

RESULTS

RF presented higher performance compared to other techniques (accuracy 80%, sensitivity 95%, specificity 80%, and the area under the curve (AUC) 0.56). Gradient boosting (AUC=0.59) showed a stronger performance compared to the neural network.

CONCLUSION

Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans. Collection, storage, and management of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management.

摘要

背景

乳腺癌被认为是由各种临床、生活方式、社会和经济因素导致的女性最常见癌症之一。机器学习有潜力根据数据中隐藏的特征来预测乳腺癌。

目的

本研究旨在使用不同的机器学习方法,应用人口统计学、实验室和乳腺X线摄影数据来预测乳腺癌。

材料与方法

在这项分析性研究中,数据库包含5178条独立记录,其中25%属于乳腺癌患者,每条记录有24个属性,该数据库来自伊朗德黑兰的莫塔梅德癌症研究所(ACECR)。该数据库包含5178条独立记录,其中25%属于乳腺癌患者,每条记录包含24个属性。本研究使用了随机森林(RF)、神经网络(MLP)、梯度提升树(GBT)和遗传算法(GA)。模型最初使用人口统计学和实验室特征(20个特征)进行训练。然后使用所有人口统计学、实验室和乳腺X线摄影特征(24个特征)对模型进行训练,以测量乳腺X线摄影特征在预测乳腺癌方面的有效性。

结果

与其他技术相比,随机森林表现出更高的性能(准确率80%,灵敏度95%,特异性80%,曲线下面积(AUC)0.56)。梯度提升(AUC = 0.59)与神经网络相比表现出更强的性能。

结论

在乳腺癌预测建模中结合多个风险因素有助于通过必要的护理计划对疾病进行早期诊断。基于多种因素收集、存储和管理不同数据以及智能系统来预测乳腺癌,对疾病管理是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e6b/9175124/ac850168638f/JBPE-12-297-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索