• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

风险因素与主要癌症之间的关联:可解释机器学习方法

Association Between Risk Factors and Major Cancers: Explainable Machine Learning Approach.

作者信息

Huang Xiayuan, Ren Shushun, Mao Xinyue, Chen Sirui, Chen Elle, He Yuqi, Jiang Yun

机构信息

Department of Biostatistics, Yale University, New Haven, CT, United States.

School of Nursing, University of Michigan-Ann Arbor, 400 North Ingalls Street, Ann Arbor, MI, 48109, United States, 1 7347633705, 1 7346472416.

出版信息

JMIR Cancer. 2025 May 2;11:e62833. doi: 10.2196/62833.

DOI:10.2196/62833
PMID:40315870
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12064211/
Abstract

BACKGROUND

Cancer is a life-threatening disease and a leading cause of death worldwide, with an estimated 611,000 deaths and over 2 million new cases in the United States in 2024. The rising incidence of major cancers, including among younger individuals, highlights the need for early screening and monitoring of risk factors to manage and decrease cancer risk.

OBJECTIVE

This study aimed to leverage explainable machine learning models to identify and analyze the key risk factors associated with breast, colorectal, lung, and prostate cancers. By uncovering significant associations between risk factors and these major cancer types, we sought to enhance the understanding of cancer diagnosis risk profiles. Our goal was to facilitate more precise screening, early detection, and personalized prevention strategies, ultimately contributing to better patient outcomes and promoting health equity.

METHODS

Deidentified electronic health record data from Medical Information Mart for Intensive Care (MIMIC)-III was used to identify patients with 4 types of cancer who had longitudinal hospital visits prior to their diagnosis presence. Their records were matched and combined with those of patients without cancer diagnoses using propensity scores based on demographic factors. Three advanced models, penalized logistic regression, random forest, and multilayer perceptron (MLP), were conducted to identify the rank of risk factors for each cancer type, with feature importance analysis for random forest and MLP models. The rank biased overlap was adopted to compare the similarity of ranked risk factors across cancer types.

RESULTS

Our framework evaluated the prediction performance of explainable machine learning models, with the MLP model demonstrating the best performance. It achieved an area under the receiver operating characteristic curve of 0.78 for breast cancer (n=58), 0.76 for colorectal cancer (n=140), 0.84 for lung cancer (n=398), and 0.78 for prostate cancer (n=104), outperforming other baseline models (P<.001). In addition to demographic risk factors, the most prominent nontraditional risk factors overlapped across models and cancer types, including hyperlipidemia (odds ratio [OR] 1.14, 95% CI 1.11-1.17; P<.01), diabetes (OR 1.34, 95% CI 1.29-1.39; P<.01), depressive disorders (OR 1.11, 95% CI 1.06-1.16; P<.01), heart diseases (OR 1.42, 95% CI 1.32-1.52; P<.01), and anemia (OR 1.22, 95% CI 1.14-1.30; P<.01). The similarity analysis indicated the unique risk factor pattern for lung cancer from other cancer types.

CONCLUSIONS

The study's findings demonstrated the effectiveness of explainable ML models in assessing nontraditional risk factors for major cancers and highlighted the importance of considering unique risk profiles for different cancer types. Moreover, this research served as a hypothesis-generating foundation, providing preliminary results for future investigation into cancer diagnosis risk analysis and management. Furthermore, expanding collaboration with clinical experts for external validation would be essential to refine model outputs, integrate findings into practice, and enhance their impact on patient care and cancer prevention efforts.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1225/12064211/3353cfc9740d/cancer-v11-e62833-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1225/12064211/4648f84c3bc3/cancer-v11-e62833-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1225/12064211/cea4d900c91e/cancer-v11-e62833-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1225/12064211/3353cfc9740d/cancer-v11-e62833-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1225/12064211/4648f84c3bc3/cancer-v11-e62833-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1225/12064211/cea4d900c91e/cancer-v11-e62833-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1225/12064211/3353cfc9740d/cancer-v11-e62833-g003.jpg
摘要

背景

癌症是一种危及生命的疾病,是全球主要死因之一,2024年美国估计有61.1万人死亡,新增病例超过200万。包括年轻人在内的主要癌症发病率不断上升,凸显了早期筛查和监测风险因素以管理和降低癌症风险的必要性。

目的

本研究旨在利用可解释的机器学习模型来识别和分析与乳腺癌、结直肠癌、肺癌和前列腺癌相关的关键风险因素。通过揭示风险因素与这些主要癌症类型之间的显著关联,我们试图增进对癌症诊断风险概况的理解。我们的目标是促进更精确的筛查、早期检测和个性化预防策略,最终改善患者预后并促进健康公平。

方法

使用来自重症监护医学信息数据库(MIMIC)-III的去标识化电子健康记录数据,识别在诊断前有纵向住院记录的4种癌症患者。根据人口统计学因素,使用倾向得分将他们的记录与未患癌症的患者记录进行匹配和合并。采用三种先进模型,即惩罚逻辑回归、随机森林和多层感知器(MLP),来确定每种癌症类型风险因素的排名,并对随机森林和MLP模型进行特征重要性分析。采用秩偏重叠法比较不同癌症类型风险因素排名的相似性。

结果

我们的框架评估了可解释机器学习模型的预测性能,MLP模型表现最佳。其在乳腺癌(n = 58)中的受试者工作特征曲线下面积为0.78,在结直肠癌(n = 140)中为0.76,在肺癌(n = 398)中为0.84,在前列腺癌(n = 104)中为0.78,优于其他基线模型(P <. .001)。除人口统计学风险因素外,模型和癌症类型中最突出的非传统风险因素存在重叠,包括高脂血症(比值比[OR] 1.14,95%置信区间1.11 - 1.17;P <. .01)、糖尿病(OR 1.34,95%置信区间1.29 - 1.39;P <. .01)、抑郁症(OR 1.11,95%置信区间1.06 - 1.16;P <. .01)、心脏病(OR 1.42,95%置信区间1.32 - 1.52;P <. .01)和贫血(OR 1.22,95%置信区间1.14 - 1.30;P <. .01)。相似性分析表明肺癌与其他癌症类型的风险因素模式不同。

结论

该研究结果证明了可解释的机器学习模型在评估主要癌症非传统风险因素方面的有效性,并强调了考虑不同癌症类型独特风险概况的重要性。此外,本研究作为一个产生假设的基础,为未来癌症诊断风险分析和管理的研究提供了初步结果。此外,扩大与临床专家的合作进行外部验证对于完善模型输出、将研究结果应用于实践以及增强其对患者护理和癌症预防工作的影响至关重要。

相似文献

1
Association Between Risk Factors and Major Cancers: Explainable Machine Learning Approach.风险因素与主要癌症之间的关联:可解释机器学习方法
JMIR Cancer. 2025 May 2;11:e62833. doi: 10.2196/62833.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Prediction of STAS in lung adenocarcinoma with nodules ≤ 2 cm using machine learning: a multicenter retrospective study.使用机器学习预测直径≤2 cm的肺腺癌中的STAS:一项多中心回顾性研究
BMC Cancer. 2025 Mar 7;25(1):417. doi: 10.1186/s12885-025-13783-z.
4
Towards proactive palliative care in oncology: developing an explainable EHR-based machine learning model for mortality risk prediction.迈向肿瘤学积极的姑息治疗:开发基于可解释电子健康记录的机器学习模型进行死亡率风险预测。
BMC Palliat Care. 2024 May 20;23(1):124. doi: 10.1186/s12904-024-01457-9.
5
Developing a Machine Learning Model for Predicting 30-Day Major Adverse Cardiac and Cerebrovascular Events in Patients Undergoing Noncardiac Surgery: Retrospective Study.开发用于预测非心脏手术患者30天主要不良心脑血管事件的机器学习模型:回顾性研究
J Med Internet Res. 2025 Apr 9;27:e66366. doi: 10.2196/66366.
6
Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。
Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.
7
Predicting the risk of emergency admission with machine learning: Development and validation using linked electronic health records.使用机器学习预测急诊入院风险:基于电子健康记录的开发和验证。
PLoS Med. 2018 Nov 20;15(11):e1002695. doi: 10.1371/journal.pmed.1002695. eCollection 2018 Nov.
8
Predicting Readmission Among High-Risk Discharged Patients Using a Machine Learning Model With Nursing Data: Retrospective Study.使用包含护理数据的机器学习模型预测高危出院患者的再入院情况:一项回顾性研究。
JMIR Med Inform. 2025 Mar 5;13:e56671. doi: 10.2196/56671.
9
Early Prediction of Cardiac Arrest in the Intensive Care Unit Using Explainable Machine Learning: Retrospective Study.使用可解释机器学习对重症监护病房中的心脏骤停进行早期预测:回顾性研究。
J Med Internet Res. 2024 Sep 17;26:e62890. doi: 10.2196/62890.
10
Explainable machine learning model for prediction of 28-day all-cause mortality in immunocompromised patients in the intensive care unit: a retrospective cohort study based on MIMIC-IV database.用于预测重症监护病房免疫功能低下患者28天全因死亡率的可解释机器学习模型:一项基于MIMIC-IV数据库的回顾性队列研究
Eur J Med Res. 2025 May 3;30(1):358. doi: 10.1186/s40001-025-02622-3.

引用本文的文献

1
Deep Learning and Image Generator Health Tabular Data (IGHT) for Predicting Overall Survival in Patients With Colorectal Cancer: Retrospective Study.深度学习与图像生成器健康表格数据(IGHT)用于预测结直肠癌患者的总生存期:回顾性研究
JMIR Med Inform. 2025 Aug 19;13:e75022. doi: 10.2196/75022.

本文引用的文献

1
Integrating Explainable Machine Learning in Clinical Decision Support Systems: Study Involving a Modified Design Thinking Approach.将可解释机器学习集成到临床决策支持系统中:一项采用改进设计思维方法的研究。
JMIR Form Res. 2024 Apr 16;8:e50475. doi: 10.2196/50475.
2
Efficient and Stable Unsupervised Feature Selection Based on Novel Structured Graph and Data Discrepancy Learning.基于新型结构化图和数据差异学习的高效稳定无监督特征选择
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6229-6243. doi: 10.1109/TNNLS.2024.3385838. Epub 2025 Apr 4.
3
Explainable machine learning for breast cancer diagnosis from mammography and ultrasound images: a systematic review.
从乳腺 X 光和超声图像进行乳腺癌诊断的可解释机器学习:系统综述。
BMJ Health Care Inform. 2024 Feb 2;31(1):e100954. doi: 10.1136/bmjhci-2023-100954.
4
Modifiable risk factors for cancer in the middle East and North Africa: a scoping review.中东和北非地区癌症的可改变风险因素:范围综述。
BMC Public Health. 2024 Jan 18;24(1):223. doi: 10.1186/s12889-024-17787-5.
5
Cancer statistics, 2024.2024年癌症统计数据。
CA Cancer J Clin. 2024 Jan-Feb;74(1):12-49. doi: 10.3322/caac.21820. Epub 2024 Jan 17.
6
Beyond survival: a closer look at lead-time bias and disease-free intervals in mammography screening.超越生存:深入探讨乳腺钼靶筛查中的提前期偏倚和无病间期
J Natl Cancer Inst. 2024 Mar 7;116(3):343-344. doi: 10.1093/jnci/djad254.
7
Trends in the Prevalence of Chronic Obstructive Pulmonary Disease Among Adults Aged ≥18 Years - United States, 2011-2021.成年人中≥18 岁慢性阻塞性肺疾病患病率趋势-美国,2011-2021 年。
MMWR Morb Mortal Wkly Rep. 2023 Nov 17;72(46):1250-1256. doi: 10.15585/mmwr.mm7246a1.
8
A Narrative Literature Review on Sepsis: A Primary Manifestation of Colorectal Neoplasm.关于脓毒症的叙述性文献综述:结直肠肿瘤的一种主要表现形式
Cureus. 2023 Sep 6;15(9):e44803. doi: 10.7759/cureus.44803. eCollection 2023 Sep.
9
Heart Failure Epidemiology and Outcomes Statistics: A Report of the Heart Failure Society of America.心力衰竭流行病学与结局统计:美国心力衰竭学会报告
J Card Fail. 2023 Oct;29(10):1412-1451. doi: 10.1016/j.cardfail.2023.07.006. Epub 2023 Sep 26.
10
Cancer and Diabetes: Predictive Factors in Patients with Metabolic Syndrome.癌症与糖尿病:代谢综合征患者的预测因素
Diagnostics (Basel). 2023 Aug 11;13(16):2647. doi: 10.3390/diagnostics13162647.