Suppr超能文献

了解孟加拉国女性的癌症风险:一种利用三级医院数据对社会生殖因素进行可解释的机器学习方法。

Understanding Cancer Risk Among Bangladeshi Women: An Explainable Machine Learning Approach to Socio-Reproductive Factors Using Tertiary Hospital Data.

作者信息

Islam Muhammad Rafiqul, Islam Humayera, Siddiqua Syeda Masuma, Al Ayub Salman Bashar, Saha Beauty, Akter Nargis, Islam Rashedul, Khatun Nazrina, Craver Andrew, Ahsan Habibul

机构信息

Institute for Population and Precision Health, The University of Chicago, Chicago, IL 60637, USA.

Department of Medical Oncology, National Institute of Cancer Research and Hospital, Dhaka 1212, Bangladesh.

出版信息

Healthcare (Basel). 2025 Jun 15;13(12):1432. doi: 10.3390/healthcare13121432.

Abstract

BACKGROUND

Breast cancer poses a significant health challenge in Bangladesh, where limited screening and unique reproductive patterns contribute to delayed diagnoses and subtype-specific disparities. While reproductive risk factors such as age at menarche, parity, and contraceptive use are well studied in high-income countries, their associations with hormone-receptor-positive (HR+) and triple-negative breast cancer (TNBC) remain underexplored in low-resource settings.

METHODS

A case-control study was conducted at the National Institute of Cancer Research and Hospital (NICRH) including 486 histopathologically confirmed breast cancer cases (246 HR+, 240 TNBC) and 443 cancer-free controls. Socio-demographic and reproductive data were collected through structured interviews. Machine learning models-including Logistic Regression, Lasso, Support Vector Machines, Random Forest, and XGBoost-were trained using stratified five-fold cross-validation. Model performance was evaluated using sensitivity, F1-score, and Area Under Receiver Operating Curve (AUROC). To interpret model predictions and quantify the contribution of individual features, we employed Shapley Additive exPlanation (SHAP) values.

RESULTS

XGBoost achieved the highest overall performance (F1-score = 0.750), and SHAP-based interpretability revealed key predictors for each subtype. Rural residence, low education (≤5 years), and undernutrition were significant predictors across subtypes. Cesarean delivery and multiple abortions were more predictive of TNBC, while urban residence, employment, and higher education were more predictive of HR+. Age at menarche and age at first childbirth showed decreasing predictive importance with increasing age for HR+, while larger gaps between marriage and childbirth were more predictive of TNBC.

CONCLUSIONS

Our findings underscore the value of machine learning coupled with SHAP-based explainability in identifying context-specific risk factors for breast cancer subtypes in resource-limited settings. This approach enhances transparency and supports the development of targeted public health interventions to reduce breast cancer disparities in Bangladesh.

摘要

背景

乳腺癌在孟加拉国构成了重大的健康挑战,该国筛查手段有限且独特的生育模式导致诊断延迟和亚型特异性差异。虽然初潮年龄、生育次数和避孕措施使用等生育风险因素在高收入国家已得到充分研究,但在资源匮乏地区,它们与激素受体阳性(HR+)和三阴性乳腺癌(TNBC)的关联仍未得到充分探索。

方法

在国家癌症研究与医院(NICRH)开展了一项病例对照研究,纳入486例经组织病理学确诊的乳腺癌病例(246例HR+,240例TNBC)和443例无癌对照。通过结构化访谈收集社会人口学和生育数据。使用分层五折交叉验证对包括逻辑回归、套索回归、支持向量机、随机森林和极端梯度提升(XGBoost)在内的机器学习模型进行训练。使用灵敏度、F1分数和受试者工作特征曲线下面积(AUROC)评估模型性能。为了解释模型预测并量化个体特征的贡献,我们采用了夏普利值相加解释法(SHAP)。

结果

XGBoost实现了最高的总体性能(F1分数 = 0.750),基于SHAP的可解释性揭示了每种亚型的关键预测因素。农村居住、低教育水平(≤5年)和营养不良是各亚型的重要预测因素。剖宫产和多次流产对TNBC的预测性更强,而城市居住、就业和高等教育对HR+的预测性更强。初潮年龄和首次生育年龄对HR+的预测重要性随年龄增长而降低,而结婚与生育之间的间隔越大对TNBC的预测性越强。

结论

我们的研究结果强调了机器学习与基于SHAP的可解释性相结合在识别资源有限环境中乳腺癌亚型特定背景风险因素方面的价值。这种方法提高了透明度,并支持制定有针对性的公共卫生干预措施,以减少孟加拉国的乳腺癌差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c26e/12192815/bdb907aba51d/healthcare-13-01432-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验