Suppr超能文献

乳腺癌的关键风险评估、诊断及生存分析

Critical Risk Assessment, Diagnosis, and Survival Analysis of Breast Cancer.

作者信息

Manir Shamiha Binta, Deshpande Priya

机构信息

Department of EECE, Marquette University, Milwaukee, WI 53233, USA.

出版信息

Diagnostics (Basel). 2024 May 8;14(10):984. doi: 10.3390/diagnostics14100984.

Abstract

Breast cancer is the most prevalent type of cancer in women. Risk factor assessment can aid in directing counseling regarding risk reduction and breast cancer surveillance. This research aims to (1) investigate the relationship between various risk factors and breast cancer incidence using the BCSC (Breast Cancer Surveillance Consortium) Risk Factor Dataset and create a prediction model for assessing the risk of developing breast cancer; (2) diagnose breast cancer using the Breast Cancer Wisconsin diagnostic dataset; and (3) analyze breast cancer survivability using the SEER (Surveillance, Epidemiology, and End Results) Breast Cancer Dataset. Applying resampling techniques on the training dataset before using various machine learning techniques can affect the performance of the classifiers. The three breast cancer datasets were examined using a variety of pre-processing approaches and classification models to assess their performance in terms of accuracy, precision, F-1 scores, etc. The PCA (principal component analysis) and resampling strategies produced remarkable results. For the BCSC Dataset, the Random Forest algorithm exhibited the best performance out of the applied classifiers, with an accuracy of 87.53%. Out of the different resampling techniques applied to the training dataset for training the Random Forest classifier, the Tomek Link exhibited the best test accuracy, at 87.47%. We compared all the models used with previously used techniques. After applying the resampling techniques, the accuracy scores of the test data decreased even if the training data accuracy increased. For the Breast Cancer Wisconsin diagnostic dataset, the K-Nearest Neighbor algorithm had the best accuracy with the original dataset test set, at 94.71%, and the PCA dataset test set exhibited 95.29% accuracy for detecting breast cancer. Using the SEER Dataset, this study also explores survival analysis, employing supervised and unsupervised learning approaches to offer insights into the variables affecting breast cancer survivability. This study emphasizes the significance of individualized approaches in the management and treatment of breast cancer by incorporating phenotypic variations and recognizing the heterogeneity of the disease. Through data-driven insights and advanced machine learning, this study contributes significantly to the ongoing efforts in breast cancer research, diagnostics, and personalized medicine.

摘要

乳腺癌是女性中最常见的癌症类型。风险因素评估有助于指导关于降低风险和乳腺癌监测的咨询。本研究旨在:(1)使用乳腺癌监测联盟(BCSC)风险因素数据集调查各种风险因素与乳腺癌发病率之间的关系,并创建一个用于评估患乳腺癌风险的预测模型;(2)使用威斯康星乳腺癌诊断数据集诊断乳腺癌;(3)使用监测、流行病学和最终结果(SEER)乳腺癌数据集分析乳腺癌的生存率。在使用各种机器学习技术之前对训练数据集应用重采样技术会影响分类器的性能。使用多种预处理方法和分类模型对这三个乳腺癌数据集进行了检查,以评估它们在准确性、精确率、F1分数等方面的性能。主成分分析(PCA)和重采样策略产生了显著的结果。对于BCSC数据集,在应用的分类器中,随机森林算法表现出最佳性能,准确率为87.53%。在应用于训练数据集以训练随机森林分类器的不同重采样技术中,Tomek Link表现出最佳测试准确率,为87.47%。我们将所有使用的模型与先前使用的技术进行了比较。应用重采样技术后,即使训练数据的准确率提高,测试数据的准确率分数也会下降。对于威斯康星乳腺癌诊断数据集,K近邻算法在原始数据集测试集上的准确率最高,为94.71%,PCA数据集测试集在检测乳腺癌方面的准确率为95.29%。使用SEER数据集,本研究还探索了生存分析,采用监督和无监督学习方法来深入了解影响乳腺癌生存率的变量。本研究强调了通过纳入表型变异并认识到疾病的异质性,在乳腺癌管理和治疗中采用个体化方法的重要性。通过数据驱动的见解和先进的机器学习,本研究为乳腺癌研究、诊断和个性化医疗的持续努力做出了重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc86/11119540/6cfdba27dde0/diagnostics-14-00984-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验