• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

偏差还是生物学?电子健康记录机器学习研究中模型解释的重要性。

Bias or biology? Importance of model interpretation in machine learning studies from electronic health records.

作者信息

Momenzadeh Amanda, Shamsa Ali, Meyer Jesse G

机构信息

Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin, USA.

出版信息

JAMIA Open. 2022 Aug 8;5(3):ooac063. doi: 10.1093/jamiaopen/ooac063. eCollection 2022 Oct.

DOI:10.1093/jamiaopen/ooac063
PMID:35958671
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9360778/
Abstract

OBJECTIVE

The rate of diabetic complication progression varies across individuals and understanding factors that alter the rate of complication progression may uncover new clinical interventions for personalized diabetes management.

MATERIALS AND METHODS

We explore how various machine learning (ML) models and types of electronic health records (EHRs) can predict fast versus slow onset of neuropathy, nephropathy, ocular disease, or cardiovascular disease using only patient data collected prior to diabetes diagnosis.

RESULTS

We find that optimized random forest models performed best to accurately predict the diagnosis of a diabetic complication, with the most effective model distinguishing between fast versus slow nephropathy (AUROC = 0.75). Using all data sets combined allowed for the highest model predictive performance, and social history or laboratory alone were most predictive. SHapley Additive exPlanations (SHAP) model interpretation allowed for exploration of predictors of fast and slow complication diagnosis, including underlying biases present in the EHR. Patients in the fast group had more medical visits, incurring a potential informed decision bias.

DISCUSSION

Our study is unique in the realm of ML studies as it leverages SHAP as a starting point to explore patient markers not routinely used in diabetes monitoring. A mix of both bias and biological processes is likely present in influencing a model's ability to distinguish between groups.

CONCLUSION

Overall, model interpretation is a critical step in evaluating validity of a user-intended endpoint for a model when using EHR data, and predictors affected by bias and those driven by biologic processes should be equally recognized.

摘要

目的

糖尿病并发症进展速度因人而异,了解影响并发症进展速度的因素可能会发现个性化糖尿病管理的新临床干预措施。

材料与方法

我们探讨了各种机器学习(ML)模型和电子健康记录(EHR)类型如何仅使用糖尿病诊断前收集的患者数据来预测神经病变、肾病、眼部疾病或心血管疾病的快速发作与缓慢发作。

结果

我们发现优化后的随机森林模型在准确预测糖尿病并发症诊断方面表现最佳,最有效的模型能够区分快速肾病与缓慢肾病(曲线下面积[AUC] = 0.75)。使用所有数据集组合可实现最高的模型预测性能,单独的社会史或实验室数据预测性最强。SHapley 加性解释(SHAP)模型解释有助于探索快速和缓慢并发症诊断的预测因素,包括电子健康记录中存在的潜在偏差。快速组的患者就诊次数更多,可能存在知情决策偏差。

讨论

我们的研究在机器学习研究领域具有独特性,因为它利用 SHAP 作为起点来探索糖尿病监测中不常用的患者标志物。偏差和生物学过程可能共同影响模型区分不同组别的能力。

结论

总体而言,在使用电子健康记录数据时,模型解释是评估模型预期用户终点有效性的关键步骤,受偏差影响的预测因素和由生物学过程驱动的预测因素应得到同等重视。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/b8d82bbd3fa9/ooac063f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/cfaecb86f091/ooac063f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/b853185b9009/ooac063f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/b887f81a7db0/ooac063f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/59cf14e0cd1b/ooac063f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/dc9d985bf2db/ooac063f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/b8d82bbd3fa9/ooac063f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/cfaecb86f091/ooac063f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/b853185b9009/ooac063f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/b887f81a7db0/ooac063f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/59cf14e0cd1b/ooac063f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/dc9d985bf2db/ooac063f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c3/9360778/b8d82bbd3fa9/ooac063f6.jpg

相似文献

1
Bias or biology? Importance of model interpretation in machine learning studies from electronic health records.偏差还是生物学?电子健康记录机器学习研究中模型解释的重要性。
JAMIA Open. 2022 Aug 8;5(3):ooac063. doi: 10.1093/jamiaopen/ooac063. eCollection 2022 Oct.
2
Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation.基于机器学习和 Shapley 加法解释的 2 型糖尿病患者外周血管疾病预测模型和风险分析。
Front Endocrinol (Lausanne). 2024 Feb 28;15:1320335. doi: 10.3389/fendo.2024.1320335. eCollection 2024.
3
Towards proactive palliative care in oncology: developing an explainable EHR-based machine learning model for mortality risk prediction.迈向肿瘤学积极的姑息治疗:开发基于可解释电子健康记录的机器学习模型进行死亡率风险预测。
BMC Palliat Care. 2024 May 20;23(1):124. doi: 10.1186/s12904-024-01457-9.
4
New onset delirium prediction using machine learning and long short-term memory (LSTM) in electronic health record.基于机器学习和长短期记忆网络(LSTM)的电子病历中新发谵妄预测。
J Am Med Inform Assoc. 2022 Dec 13;30(1):120-131. doi: 10.1093/jamia/ocac210.
5
Prediction Model of Ocular Metastases in Gastric Adenocarcinoma: Machine Learning-Based Development and Interpretation Study.胃癌眼部转移的预测模型:基于机器学习的开发和解释研究。
Technol Cancer Res Treat. 2024 Jan-Dec;23:15330338231219352. doi: 10.1177/15330338231219352.
6
Understanding Heart Failure Patients EHR Clinical Features via SHAP Interpretation of Tree-Based Machine Learning Model Predictions.通过基于树的机器学习模型预测的 SHAP 解释理解心力衰竭患者的电子健康记录临床特征。
AMIA Annu Symp Proc. 2022 Feb 21;2021:813-822. eCollection 2021.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
A Machine Learning Model for Risk Stratification of Postdiagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: Retrospective Study.用于1型糖尿病儿童诊断后糖尿病酮症酸中毒住院风险分层的机器学习模型:回顾性研究
JMIR Diabetes. 2024 Aug 7;9:e53338. doi: 10.2196/53338.
9
Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study.机器学习预测糖尿病前期或糖尿病个体的微血管和大血管并发症:回顾性队列研究。
J Med Internet Res. 2023 Feb 27;25:e42181. doi: 10.2196/42181.
10
Prediction of 3-year risk of diabetic kidney disease using machine learning based on electronic medical records.基于电子病历的机器学习预测糖尿病肾病 3 年风险。
J Transl Med. 2022 Mar 26;20(1):143. doi: 10.1186/s12967-022-03339-1.

引用本文的文献

1
Regulating genome language models: navigating policy challenges at the intersection of AI and genetics.规范基因组语言模型:应对人工智能与遗传学交叉领域的政策挑战
Hum Genet. 2025 Sep 16. doi: 10.1007/s00439-025-02768-4.
2
Predicting the risks of stroke, cardiovascular disease, and peripheral vascular disease among people with type 2 diabetes with artificial intelligence models: A systematic review and meta-analysis.使用人工智能模型预测2型糖尿病患者中风、心血管疾病和外周血管疾病的风险:一项系统综述和荟萃分析。
Narra J. 2025 Apr;5(1):e2116. doi: 10.52225/narra.v5i1.2116. Epub 2025 Mar 19.
3
Machine learning-based risk predictive models for diabetic kidney disease in type 2 diabetes mellitus patients: a systematic review and meta-analysis.

本文引用的文献

1
Experimental and real-world evidence supporting the computational repurposing of bumetanide for -related Alzheimer's disease.支持布美他尼用于治疗与相关的阿尔茨海默病的计算再利用的实验和真实世界证据。
Nat Aging. 2021 Oct;1(10):932-947. doi: 10.1038/s43587-021-00122-7. Epub 2021 Oct 11.
2
A narrative review on the validity of electronic health record-based research in epidemiology.基于电子健康记录的流行病学研究的有效性的叙述性综述。
BMC Med Res Methodol. 2021 Oct 27;21(1):234. doi: 10.1186/s12874-021-01416-5.
3
Screening for Prediabetes and Type 2 Diabetes: US Preventive Services Task Force Recommendation Statement.
基于机器学习的2型糖尿病患者糖尿病肾病风险预测模型:一项系统评价和荟萃分析。
Front Endocrinol (Lausanne). 2025 Mar 3;16:1495306. doi: 10.3389/fendo.2025.1495306. eCollection 2025.
4
Development and validation of predictive models for diabetic retinopathy using machine learning.使用机器学习开发和验证糖尿病视网膜病变预测模型
PLoS One. 2025 Feb 24;20(2):e0318226. doi: 10.1371/journal.pone.0318226. eCollection 2025.
5
The application of explainable artificial intelligence (XAI) in electronic health record research: A scoping review.可解释人工智能(XAI)在电子健康记录研究中的应用:一项范围综述。
Digit Health. 2024 Oct 30;10:20552076241272657. doi: 10.1177/20552076241272657. eCollection 2024 Jan-Dec.
6
Virtual reality gameplay classification illustrates the multidimensionality of visuospatial neglect.虚拟现实游戏玩法分类说明了视觉空间忽视的多维度性。
Brain Commun. 2024 May 3;6(4):fcae145. doi: 10.1093/braincomms/fcae145. eCollection 2024.
7
Machine Learning Models for Prediction of Diabetic Microvascular Complications.机器学习模型预测糖尿病微血管并发症。
J Diabetes Sci Technol. 2024 Mar;18(2):273-286. doi: 10.1177/19322968231223726. Epub 2024 Jan 8.
8
Diagnostic rate estimation from Medicare records: Dependence on claim numbers and latent clinical features.从医疗保险记录中估计诊断率:对索赔数量和潜在临床特征的依赖。
J Biomed Inform. 2023 Sep;145:104463. doi: 10.1016/j.jbi.2023.104463. Epub 2023 Jul 28.
筛查糖尿病前期和 2 型糖尿病:美国预防服务工作组推荐声明。
JAMA. 2021 Aug 24;326(8):736-743. doi: 10.1001/jama.2021.12531.
4
A Higher Serum Calcium Level is an Independent Risk Factor for Vision-Threatening Diabetic Retinopathy in Patients with Type 2 Diabetes: Cross-Sectional and Longitudinal Analyses.血清钙水平升高是 2 型糖尿病患者发生威胁视力的糖尿病视网膜病变的独立危险因素:横断面和纵向分析。
Endocr Pract. 2021 Aug;27(8):826-833. doi: 10.1016/j.eprac.2021.05.003. Epub 2021 May 15.
5
Assessing Missing Data Assumptions in EHR-Based Studies: A Complex and Underappreciated Task.评估基于电子健康记录(EHR)研究中的缺失数据假设:一项复杂且未得到充分重视的任务。
JAMA Netw Open. 2021 Feb 1;4(2):e210184. doi: 10.1001/jamanetworkopen.2021.0184.
6
Predicting adverse outcomes due to diabetes complications with machine learning using administrative health data.利用行政健康数据通过机器学习预测糖尿病并发症导致的不良后果。
NPJ Digit Med. 2021 Feb 12;4(1):24. doi: 10.1038/s41746-021-00394-8.
7
Impact of age at type 2 diabetes mellitus diagnosis on mortality and vascular complications: systematic review and meta-analyses.2 型糖尿病诊断时年龄对死亡率和血管并发症的影响:系统评价和荟萃分析。
Diabetologia. 2021 Feb;64(2):275-287. doi: 10.1007/s00125-020-05319-w. Epub 2020 Dec 14.
8
Recommendations for Reporting Machine Learning Analyses in Clinical Research.机器学习分析在临床研究中的报告建议。
Circ Cardiovasc Qual Outcomes. 2020 Oct;13(10):e006556. doi: 10.1161/CIRCOUTCOMES.120.006556. Epub 2020 Oct 14.
9
Predicting complications of diabetes mellitus using advanced machine learning algorithms.使用先进的机器学习算法预测糖尿病并发症。
J Am Med Inform Assoc. 2020 Jul 1;27(9):1343-1351. doi: 10.1093/jamia/ocaa120.
10
From Local Explanations to Global Understanding with Explainable AI for Trees.利用可解释人工智能实现从局部解释到树木的全局理解
Nat Mach Intell. 2020 Jan;2(1):56-67. doi: 10.1038/s42256-019-0138-9. Epub 2020 Jan 17.