Suppr超能文献

SHAP在基于年龄的乳腺钼靶问卷调查数据亚组划分中的应用,用于乳腺钼靶阳性预测和风险因素识别的可解释机器学习。

Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification.

作者信息

Sun Jeffrey, Sun Cheuk-Kay, Tang Yun-Xuan, Liu Tzu-Chi, Lu Chi-Jie

机构信息

Department of Acute Medicine, West Middlesex University Hospital, London TW7 6AF, UK.

School of Medicine, Imperial College London, London SW7 2BX, UK.

出版信息

Healthcare (Basel). 2023 Jul 11;11(14):2000. doi: 10.3390/healthcare11142000.

Abstract

Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine.

摘要

乳腺钼靶检查被认为是乳腺癌筛查的金标准。已经确定了多种影响乳腺癌发展的风险因素;然而,关于这些因素的重要性仍存在争议。机器学习(ML)模型和夏普利值加性解释(SHAP)方法可以对风险因素进行排序,并提供解释性的模型结果。本研究使用带有SHAP的ML算法来分析两个不同年龄组之间的风险因素,并评估每个因素在预测乳腺钼靶检查阳性结果中的影响。ML模型是使用2017年至2021年参与乳腺癌筛查项目的女性风险因素调查问卷中的数据构建的。应用了三种ML模型,即最小绝对收缩和选择算子(lasso)逻辑回归、极端梯度提升(XGBoost)和随机森林(RF)。RF表现最佳。然后将SHAP值应用于RF模型进行进一步分析。该模型确定初潮年龄、教育水平、生育情况、乳房自我检查和体重指数是影响乳腺钼靶检查结果的前五大重要风险因素。按生殖寿命和体重指数排序的年龄组之间的差异分别在较年轻和较年长的年龄组中更大。使用SHAP框架使我们能够理解风险因素之间的关系,并生成个性化的风险因素排名。本研究为进一步研究和个性化医疗提供了途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/041e/10379972/6c76b21c5516/healthcare-11-02000-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验