• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

电子健康记录(EHR)数据中的偏差概念化:以儿科肥胖发病率分类器按人口亚组划分的性能差异为例

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.

作者信息

Campbell Elizabeth A, Bose Saurav, Masino Aaron J

机构信息

Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America.

Department of Biomedical Informatics, Columbia University Medical Center, New York, New York, United States of America.

出版信息

PLOS Digit Health. 2024 Oct 23;3(10):e0000642. doi: 10.1371/journal.pdig.0000642. eCollection 2024 Oct.

DOI:10.1371/journal.pdig.0000642
PMID:39441784
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11498669/
Abstract

Electronic Health Records (EHRs) are increasingly used to develop machine learning models in predictive medicine. There has been limited research on utilizing machine learning methods to predict childhood obesity and related disparities in classifier performance among vulnerable patient subpopulations. In this work, classification models are developed to recognize pediatric obesity using temporal condition patterns obtained from patient EHR data in a U.S. study population. We trained four machine learning algorithms (Logistic Regression, Random Forest, Gradient Boosted Trees, and Neural Networks) to classify cases and controls as obesity positive or negative, and optimized hyperparameter settings through a bootstrapping methodology. To assess the classifiers for bias, we studied model performance by population subgroups then used permutation analysis to identify the most predictive features for each model and the demographic characteristics of patients with these features. Mean AUC-ROC values were consistent across classifiers, ranging from 0.72-0.80. Some evidence of bias was identified, although this was through the models performing better for minority subgroups (African Americans and patients enrolled in Medicaid). Permutation analysis revealed that patients from vulnerable population subgroups were over-represented among patients with the most predictive diagnostic patterns. We hypothesize that our models performed better on under-represented groups because the features more strongly associated with obesity were more commonly observed among minority patients. These findings highlight the complex ways that bias may arise in machine learning models and can be incorporated into future research to develop a thorough analytical approach to identify and mitigate bias that may arise from features and within EHR datasets when developing more equitable models.

摘要

电子健康记录(EHRs)越来越多地用于开发预测医学中的机器学习模型。利用机器学习方法预测儿童肥胖症以及弱势群体亚群中分类器性能的相关差异的研究有限。在这项工作中,我们开发了分类模型,以利用从美国研究人群的患者EHR数据中获得的时间条件模式来识别儿童肥胖症。我们训练了四种机器学习算法(逻辑回归、随机森林、梯度提升树和神经网络),将病例和对照分类为肥胖阳性或阴性,并通过自抽样方法优化超参数设置。为了评估分类器的偏差,我们按人群亚组研究了模型性能,然后使用排列分析来确定每个模型最具预测性的特征以及具有这些特征的患者的人口统计学特征。各分类器的平均AUC-ROC值一致,范围为0.72 - 0.80。虽然通过模型在少数群体亚组(非裔美国人和参加医疗补助计划的患者)中表现更好发现了一些偏差迹象。排列分析显示,在具有最具预测性诊断模式的患者中,弱势群体亚组的患者占比过高。我们假设我们的模型在代表性不足的群体上表现更好,因为与肥胖更密切相关的特征在少数族裔患者中更常见。这些发现凸显了机器学习模型中可能出现偏差的复杂方式,并且可以纳入未来的研究中,以开发一种全面的分析方法,在开发更公平的模型时识别和减轻可能由特征以及EHR数据集中产生的偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a026/11498669/ca01128f0b51/pdig.0000642.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a026/11498669/c6351ae25151/pdig.0000642.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a026/11498669/ca01128f0b51/pdig.0000642.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a026/11498669/c6351ae25151/pdig.0000642.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a026/11498669/ca01128f0b51/pdig.0000642.g002.jpg

相似文献

1
Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.电子健康记录(EHR)数据中的偏差概念化:以儿科肥胖发病率分类器按人口亚组划分的性能差异为例
PLOS Digit Health. 2024 Oct 23;3(10):e0000642. doi: 10.1371/journal.pdig.0000642. eCollection 2024 Oct.
2
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发:以从出院小结中识别肥胖且伴有多种合并症的患者为例。
J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.
3
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
4
5
Predicting polycystic ovary syndrome with machine learning algorithms from electronic health records.基于电子健康记录的机器学习算法预测多囊卵巢综合征。
Front Endocrinol (Lausanne). 2024 Jan 30;15:1298628. doi: 10.3389/fendo.2024.1298628. eCollection 2024.
6
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
7
Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study.使用自动整理的电子健康记录数据(Pythia)开发和验证机器学习模型以识别高风险手术患者:一项回顾性、单站点研究。
PLoS Med. 2018 Nov 27;15(11):e1002701. doi: 10.1371/journal.pmed.1002701. eCollection 2018 Nov.
8
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
9
Improving Risk Prediction of Methicillin-Resistant Staphylococcus aureus Using Machine Learning Methods With Network Features: Retrospective Development Study.使用具有网络特征的机器学习方法改善耐甲氧西林金黄色葡萄球菌的风险预测:回顾性开发研究
JMIR AI. 2024 May 16;3:e48067. doi: 10.2196/48067.
10
Predicting polycystic ovary syndrome (PCOS) with machine learning algorithms from electronic health records.利用电子健康记录中的机器学习算法预测多囊卵巢综合征(PCOS)。
medRxiv. 2023 Oct 1:2023.07.27.23293255. doi: 10.1101/2023.07.27.23293255.

引用本文的文献

1
A community-based approach to ethical decision-making in artificial intelligence for health care.一种基于社区的医疗保健人工智能伦理决策方法。
JAMIA Open. 2025 Aug 7;8(4):ooaf076. doi: 10.1093/jamiaopen/ooaf076. eCollection 2025 Aug.
2
Principles and implementation strategies for equitable and representative academic partnerships in global health informatics research.全球卫生信息学研究中公平且具代表性的学术伙伴关系的原则与实施策略
J Am Med Inform Assoc. 2025 May 1;32(5):958-963. doi: 10.1093/jamia/ocaf015.

本文引用的文献

1
Fairness of Machine Learning Algorithms for Predicting Foregone Preventive Dental Care for Adults.机器学习算法在预测成年人未接受预防牙科护理方面的公平性。
JAMA Netw Open. 2023 Nov 1;6(11):e2341625. doi: 10.1001/jamanetworkopen.2023.41625.
2
Racial Equity in Healthcare Machine Learning: Illustrating Bias in Models With Minimal Bias Mitigation.医疗保健机器学习中的种族平等:在最小化偏差缓解的模型中说明偏差
Cureus. 2023 Feb 15;15(2):e35037. doi: 10.7759/cureus.35037. eCollection 2023 Feb.
3
Machine learning did not beat logistic regression in time series prediction for severe asthma exacerbations.
机器学习并未在严重哮喘恶化的时间序列预测中击败逻辑回归。
Sci Rep. 2022 Nov 27;12(1):20363. doi: 10.1038/s41598-022-24909-9.
4
Randomized Clinical Trials of Machine Learning Interventions in Health Care: A Systematic Review.机器学习干预在医疗保健中的随机临床试验:系统评价。
JAMA Netw Open. 2022 Sep 1;5(9):e2233946. doi: 10.1001/jamanetworkopen.2022.33946.
5
Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index.评估医疗保健中机器学习算法的社会经济偏差:以 HOUSES 指数为例。
J Am Med Inform Assoc. 2022 Jun 14;29(7):1142-1151. doi: 10.1093/jamia/ocac052.
6
Allergic rhinitis co-morbidity on asthma outcomes in city school children.城市学龄儿童变应性鼻炎共病对哮喘结局的影响。
J Asthma. 2023 Feb;60(2):255-261. doi: 10.1080/02770903.2022.2043363. Epub 2022 May 13.
7
Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review.基于监督机器学习技术开发的预测模型研究中的偏倚风险:系统评价。
BMJ. 2021 Oct 20;375:n2281. doi: 10.1136/bmj.n2281.
8
Mitigating bias in machine learning for medicine.减轻医学机器学习中的偏差。
Commun Med (Lond). 2021 Aug 23;1:25. doi: 10.1038/s43856-021-00028-w.
9
Artificial intelligence-assisted clinical decision support for childhood asthma management: A randomized clinical trial.人工智能辅助临床决策支持在儿童哮喘管理中的应用:一项随机临床试验。
PLoS One. 2021 Aug 2;16(8):e0255261. doi: 10.1371/journal.pone.0255261. eCollection 2021.
10
Discovery, Learning, and Experimentation With Artificial Intelligence-Based Tools at the Point of Care-Perils and Opportunity.在医疗现场使用基于人工智能的工具进行发现、学习与实验——风险与机遇
JAMA Netw Open. 2021 Mar 1;4(3):e211474. doi: 10.1001/jamanetworkopen.2021.1474.