SHAP在基于年龄的乳腺钼靶问卷调查数据亚组划分中的应用，用于乳腺钼靶阳性预测和风险因素识别的可解释机器学习。

Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification.

作者信息

Sun Jeffrey, Sun Cheuk-Kay, Tang Yun-Xuan, Liu Tzu-Chi, Lu Chi-Jie

机构信息

Department of Acute Medicine, West Middlesex University Hospital, London TW7 6AF, UK.

School of Medicine, Imperial College London, London SW7 2BX, UK.

出版信息

Healthcare (Basel). 2023 Jul 11;11(14):2000. doi: 10.3390/healthcare11142000.

DOI:10.3390/healthcare11142000

PMID:37510441

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10379972/

Abstract

Mammography is considered the gold standard for breast cancer screening. Multiple risk factors that affect breast cancer development have been identified; however, there is an ongoing debate regarding the significance of these factors. Machine learning (ML) models and Shapley Additive Explanation (SHAP) methodology can rank risk factors and provide explanatory model results. This study used ML algorithms with SHAP to analyze the risk factors between two different age groups and evaluate the impact of each factor in predicting positive mammography. The ML model was built using data from the risk factor questionnaires of women participating in a breast cancer screening program from 2017 to 2021. Three ML models, least absolute shrinkage and selection operator (lasso) logistic regression, extreme gradient boosting (XGBoost), and random forest (RF), were applied. RF generated the best performance. The SHAP values were then applied to the RF model for further analysis. The model identified age at menarche, education level, parity, breast self-examination, and BMI as the top five significant risk factors affecting mammography outcomes. The differences between age groups ranked by reproductive lifespan and BMI were higher in the younger and older age groups, respectively. The use of SHAP frameworks allows us to understand the relationships between risk factors and generate individualized risk factor rankings. This study provides avenues for further research and individualized medicine.

摘要

乳腺钼靶检查被认为是乳腺癌筛查的金标准。已经确定了多种影响乳腺癌发展的风险因素；然而，关于这些因素的重要性仍存在争议。机器学习（ML）模型和夏普利值加性解释（SHAP）方法可以对风险因素进行排序，并提供解释性的模型结果。本研究使用带有SHAP的ML算法来分析两个不同年龄组之间的风险因素，并评估每个因素在预测乳腺钼靶检查阳性结果中的影响。ML模型是使用2017年至2021年参与乳腺癌筛查项目的女性风险因素调查问卷中的数据构建的。应用了三种ML模型，即最小绝对收缩和选择算子（lasso）逻辑回归、极端梯度提升（XGBoost）和随机森林（RF）。RF表现最佳。然后将SHAP值应用于RF模型进行进一步分析。该模型确定初潮年龄、教育水平、生育情况、乳房自我检查和体重指数是影响乳腺钼靶检查结果的前五大重要风险因素。按生殖寿命和体重指数排序的年龄组之间的差异分别在较年轻和较年长的年龄组中更大。使用SHAP框架使我们能够理解风险因素之间的关系，并生成个性化的风险因素排名。本研究为进一步研究和个性化医疗提供了途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/041e/10379972/6c76b21c5516/healthcare-11-02000-g001.jpg

相似文献

Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification.SHAP在基于年龄的乳腺钼靶问卷调查数据亚组划分中的应用，用于乳腺钼靶阳性预测和风险因素识别的可解释机器学习。

Healthcare (Basel). 2023 Jul 11;11(14):2000. doi: 10.3390/healthcare11142000.

A machine learning model based on ultrasound image features to assess the risk of sentinel lymph node metastasis in breast cancer patients: Applications of scikit-learn and SHAP.一种基于超声图像特征的机器学习模型，用于评估乳腺癌患者前哨淋巴结转移风险：scikit-learn和SHAP的应用

Front Oncol. 2022 Jul 25;12:944569. doi: 10.3389/fonc.2022.944569. eCollection 2022.

An Integrated Machine Learning Scheme for Predicting Mammographic Anomalies in High-Risk Individuals Using Questionnaire-Based Predictors.基于问卷预测因子的高危人群乳腺异常的集成机器学习预测方案。

Int J Environ Res Public Health. 2022 Aug 8;19(15):9756. doi: 10.3390/ijerph19159756.

A hybrid approach for modeling bicycle crash frequencies: Integrating random forest based SHAP model with random parameter negative binomial regression model.基于随机森林的 SHAP 模型与随机参数负二项回归模型相结合的自行车碰撞频率建模混合方法。

Accid Anal Prev. 2024 Dec;208:107778. doi: 10.1016/j.aap.2024.107778. Epub 2024 Sep 16.

Explainable artificial intelligence model for identifying COVID-19 gene biomarkers.用于识别 COVID-19 基因生物标志物的可解释人工智能模型。

Comput Biol Med. 2023 Mar;154:106619. doi: 10.1016/j.compbiomed.2023.106619. Epub 2023 Feb 1.

Explainable machine learning model to predict refeeding hypophosphatemia.解释性机器学习模型预测再喂养性低磷血症。

Clin Nutr ESPEN. 2021 Oct;45:213-219. doi: 10.1016/j.clnesp.2021.08.022. Epub 2021 Sep 10.

Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer.用于预测肺癌中表皮生长因子受体（EGFR）突变的可解释机器学习模型

Front Oncol. 2022 Jun 23;12:924144. doi: 10.3389/fonc.2022.924144. eCollection 2022.

Incorporation of a machine learning pathological diagnosis algorithm into the thyroid ultrasound imaging data improves the diagnosis risk of malignant thyroid nodules.将机器学习病理诊断算法纳入甲状腺超声成像数据可提高甲状腺恶性结节的诊断风险。

Front Oncol. 2022 Dec 8;12:968784. doi: 10.3389/fonc.2022.968784. eCollection 2022.

A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study.长期护理机构中老年人身体约束的风险预测模型：机器学习研究。

J Med Internet Res. 2023 Apr 6;25:e43815. doi: 10.2196/43815.

[Comparison of machine learning and Logistic regression model in predicting acute kidney injury after cardiac surgery: data analysis based on MIMIC-III database].[机器学习与逻辑回归模型在预测心脏手术后急性肾损伤中的比较：基于MIMIC-III数据库的数据分析]

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2022 Nov;34(11):1188-1193. doi: 10.3760/cma.j.cn121430-20210223-00279.

引用本文的文献

Personalized colorectal cancer risk assessment through explainable AI and Gut microbiome profiling.通过可解释的人工智能和肠道微生物群分析进行个性化结直肠癌风险评估。

Gut Microbes. 2025 Dec;17(1):2543124. doi: 10.1080/19490976.2025.2543124. Epub 2025 Aug 4.

Incorporation of explainable artificial intelligence in ensemble machine learning-driven pancreatic cancer diagnosis.将可解释人工智能整合到集成机器学习驱动的胰腺癌诊断中。

Sci Rep. 2025 Apr 23;15(1):14038. doi: 10.1038/s41598-025-98298-0.

Predicting Major Preoperative Risk Factors for Retears After Arthroscopic Rotator Cuff Repair Using Machine Learning Algorithms.使用机器学习算法预测关节镜下肩袖修复术后再撕裂的主要术前风险因素

J Clin Med. 2025 Mar 9;14(6):1843. doi: 10.3390/jcm14061843.

Development and validation of a machine learning approach for screening new leprosy cases based on the leprosy suspicion questionnaire.基于麻风病疑似问卷的机器学习方法用于筛查新麻风病病例的开发与验证

Sci Rep. 2025 Feb 26;15(1):6912. doi: 10.1038/s41598-025-91462-6.

The Potential of SHAP and Machine Learning for Personalized Explanations of Influencing Factors in Myopic Treatment for Children.SHAP与机器学习在儿童近视治疗影响因素个性化解释方面的潜力

Medicina (Kaunas). 2024 Dec 26;61(1):16. doi: 10.3390/medicina61010016.

Explainable Thyroid Cancer Diagnosis Through Two-Level Machine Learning Optimization with an Improved Naked Mole-Rat Algorithm.通过使用改进的裸鼹鼠算法进行两级机器学习优化实现可解释的甲状腺癌诊断

Cancers (Basel). 2024 Dec 10;16(24):4128. doi: 10.3390/cancers16244128.

Explainable artificial intelligence for stroke prediction through comparison of deep learning and machine learning models.通过深度学习与机器学习模型比较实现可解释的人工智能用于中风预测

Sci Rep. 2024 Dec 28;14(1):31392. doi: 10.1038/s41598-024-82931-5.

Construction of a risk prediction model for postoperative deep vein thrombosis in colorectal cancer patients based on machine learning algorithms.基于机器学习算法构建结直肠癌患者术后深静脉血栓形成风险预测模型。

Front Oncol. 2024 Nov 27;14:1499794. doi: 10.3389/fonc.2024.1499794. eCollection 2024.

Assessment of non-fatal injuries among university students in Hainan: a machine learning approach to exploring key factors.海南大学生非致命伤评估：一种探索关键因素的机器学习方法

Front Public Health. 2024 Nov 21;12:1453650. doi: 10.3389/fpubh.2024.1453650. eCollection 2024.

Explainable artificial intelligence (XAI) for predicting the need for intubation in methanol-poisoned patients: a study comparing deep and machine learning models.可解释人工智能 (XAI) 在预测甲醇中毒患者需要插管中的应用：比较深度学习和机器学习模型的研究。

Sci Rep. 2024 Jul 8;14(1):15751. doi: 10.1038/s41598-024-66481-4.

本文引用的文献

HER2 classification in breast cancer cells: A new explainable machine learning application for immunohistochemistry.乳腺癌细胞中的HER2分类：一种用于免疫组织化学的新型可解释机器学习应用。

Oncol Lett. 2022 Dec 14;25(2):44. doi: 10.3892/ol.2022.13630. eCollection 2023 Feb.

Breastfeeding reduces the risk of breast cancer: A call for action in high-income countries with low rates of breastfeeding.母乳喂养可降低乳腺癌风险：呼吁在母乳喂养率低的高收入国家采取行动。

Cancer Med. 2023 Feb;12(4):4616-4625. doi: 10.1002/cam4.5288. Epub 2022 Sep 26.

Postmenopausal overweight and breast cancer risk; results from the KARMA cohort.绝经后超重与乳腺癌风险：KARMA 队列研究结果。

Breast Cancer Res Treat. 2022 Nov;196(1):185-196. doi: 10.1007/s10549-022-06664-7. Epub 2022 Aug 30.

Int J Environ Res Public Health. 2022 Aug 8;19(15):9756. doi: 10.3390/ijerph19159756.

Forecast of a future leveling of the incidence trends of female breast cancer in Taiwan: an age-period-cohort analysis.台湾地区女性乳腺癌发病率趋势未来将趋于平稳：一项基于年龄-时期-队列的分析。

Sci Rep. 2022 Jul 21;12(1):12481. doi: 10.1038/s41598-022-16056-y.

Global guidelines for breast cancer screening: A systematic review.全球乳腺癌筛查指南：系统评价。

Breast. 2022 Aug;64:85-99. doi: 10.1016/j.breast.2022.04.003. Epub 2022 Apr 19.

Use and Applicability of the Gail Model to Calculate Breast Cancer Risk: A Scoping Review.使用盖尔模型计算乳腺癌风险的情况和适用性：范围综述。

Asian Pac J Cancer Prev. 2022 Apr 1;23(4):1117-1123. doi: 10.31557/APJCP.2022.23.4.1117.

Obesity and Breast Cancer Risk: The Oncogenic Implications of Metabolic Dysregulation.肥胖与乳腺癌风险：代谢失调的致癌意义。

J Clin Endocrinol Metab. 2022 Jul 14;107(8):2154-2166. doi: 10.1210/clinem/dgac241.

Potential of the Non-Contrast-Enhanced Chest CT Radiomics to Distinguish Molecular Subtypes of Breast Cancer: A Retrospective Study.非增强胸部CT影像组学鉴别乳腺癌分子亚型的潜力：一项回顾性研究

Front Oncol. 2022 Mar 21;12:848726. doi: 10.3389/fonc.2022.848726. eCollection 2022.

Nationwide mammographic screening and breast cancer mortality in Taiwan: an interrupted time-series analysis.台湾地区全国性乳房 X 光筛检与乳癌死亡率：中断时间序列分析。

Breast Cancer. 2022 Mar;29(2):336-342. doi: 10.1007/s12282-021-01315-z. Epub 2021 Nov 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SHAP在基于年龄的乳腺钼靶问卷调查数据亚组划分中的应用，用于乳腺钼靶阳性预测和风险因素识别的可解释机器学习。

Application of SHAP for Explainable Machine Learning on Age-Based Subgrouping Mammography Questionnaire Data for Positive Mammography Prediction and Risk Factor Identification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献