• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过识别具有完整数据的患者来丰富电子健康记录队列的广泛适用方法:多站点评估。

A broadly applicable approach to enrich electronic-health-record cohorts by identifying patients with complete data: a multisite evaluation.

机构信息

Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, United States.

Department of Medicine, Harvard Medical School, Boston, MA 02115, United States.

出版信息

J Am Med Inform Assoc. 2023 Nov 17;30(12):1985-1994. doi: 10.1093/jamia/ocad166.

DOI:10.1093/jamia/ocad166
PMID:37632234
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10654861/
Abstract

OBJECTIVE

Patients who receive most care within a single healthcare system (colloquially called a "loyalty cohort" since they typically return to the same providers) have mostly complete data within that organization's electronic health record (EHR). Loyalty cohorts have low data missingness, which can unintentionally bias research results. Using proxies of routine care and healthcare utilization metrics, we compute a per-patient score that identifies a loyalty cohort.

MATERIALS AND METHODS

We implemented a computable program for the widely adopted i2b2 platform that identifies loyalty cohorts in EHRs based on a machine-learning model, which was previously validated using linked claims data. We developed a novel validation approach, which tests, using only EHR data, whether patients returned to the same healthcare system after the training period. We evaluated these tools at 3 institutions using data from 2017 to 2019.

RESULTS

Loyalty cohort calculations to identify patients who returned during a 1-year follow-up yielded a mean area under the receiver operating characteristic curve of 0.77 using the original model and 0.80 after calibrating the model at individual sites. Factors such as multiple medications or visits contributed significantly at all sites. Screening tests' contributions (eg, colonoscopy) varied across sites, likely due to coding and population differences.

DISCUSSION

This open-source implementation of a "loyalty score" algorithm had good predictive power. Enriching research cohorts by utilizing these low-missingness patients is a way to obtain the data completeness necessary for accurate causal analysis.

CONCLUSION

i2b2 sites can use this approach to select cohorts with mostly complete EHR data.

摘要

目的

在单一医疗保健系统中接受大部分护理的患者(由于他们通常返回同一医疗服务提供者,因此俗称“忠诚队列”)在该组织的电子健康记录(EHR)中拥有大部分完整的数据。忠诚队列的数据缺失率较低,这可能会无意中影响研究结果。使用常规护理和医疗保健利用指标的代理,我们计算出每个患者的分数,以确定忠诚队列。

材料和方法

我们在广泛采用的 i2b2 平台上实现了一个可计算的程序,该程序根据先前使用链接索赔数据验证的机器学习模型,在 EHR 中识别忠诚队列。我们开发了一种新颖的验证方法,该方法仅使用 EHR 数据测试患者在培训期后是否返回同一医疗保健系统。我们在 3 家机构中使用 2017 年至 2019 年的数据评估了这些工具。

结果

使用原始模型,忠诚度队列计算以识别在 1 年随访期间返回的患者,其受试者工作特征曲线下的平均面积为 0.77,在单个站点校准模型后为 0.80。所有站点的多种药物或就诊等因素都有重要贡献。筛选测试的贡献(例如结肠镜检查)因站点而异,可能是由于编码和人群差异所致。

讨论

这种“忠诚度评分”算法的开源实现具有良好的预测能力。通过利用这些低缺失率患者丰富研究队列,可以获得进行准确因果分析所需的完整数据。

结论

i2b2 站点可以使用此方法选择具有大部分完整 EHR 数据的队列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1166/10654861/4948a9fce140/ocad166f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1166/10654861/29f2bbc8d1f3/ocad166f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1166/10654861/822e7b734199/ocad166f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1166/10654861/4948a9fce140/ocad166f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1166/10654861/29f2bbc8d1f3/ocad166f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1166/10654861/822e7b734199/ocad166f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1166/10654861/4948a9fce140/ocad166f3.jpg

相似文献

1
A broadly applicable approach to enrich electronic-health-record cohorts by identifying patients with complete data: a multisite evaluation.一种通过识别具有完整数据的患者来丰富电子健康记录队列的广泛适用方法:多站点评估。
J Am Med Inform Assoc. 2023 Nov 17;30(12):1985-1994. doi: 10.1093/jamia/ocad166.
2
Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project.使用链接的索赔-电子健康记录数据库对感兴趣的健康结果进行电子表型分析:来自机器学习试点项目的结果。
J Am Med Inform Assoc. 2021 Jul 14;28(7):1507-1517. doi: 10.1093/jamia/ocab036.
3
Dynamic ElecTronic hEalth reCord deTection (DETECT) of individuals at risk of a first episode of psychosis: a case-control development and validation study.动态电子健康记录检测(DETECT)对首发精神病风险个体的识别:一项病例对照研究。
Lancet Digit Health. 2020 May;2(5):e229-e239. doi: 10.1016/S2589-7500(20)30024-8. Epub 2020 Mar 26.
4
Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer.基于电子健康记录数据的机器学习算法在肺癌纵向队列患者中识别和估计生存的性能。
JAMA Netw Open. 2021 Jul 1;4(7):e2114723. doi: 10.1001/jamanetworkopen.2021.14723.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Claims-Based Algorithms for Identifying Patients With Pulmonary Hypertension: A Comparison of Decision Rules and Machine-Learning Approaches.基于索赔的肺动脉高压患者识别算法:决策规则与机器学习方法的比较。
J Am Heart Assoc. 2020 Oct 20;9(19):e016648. doi: 10.1161/JAHA.120.016648. Epub 2020 Sep 29.
7
Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study.使用自动整理的电子健康记录数据(Pythia)开发和验证机器学习模型以识别高风险手术患者:一项回顾性、单站点研究。
PLoS Med. 2018 Nov 27;15(11):e1002701. doi: 10.1371/journal.pmed.1002701. eCollection 2018 Nov.
8
Health care transformation through collaboration on open-source informatics projects: integrating a medical applications platform, research data repository, and patient summarization.通过开源信息学项目合作实现医疗保健转型:整合医疗应用平台、研究数据存储库和患者摘要。
Interact J Med Res. 2013 May 30;2(1):e11. doi: 10.2196/ijmr.2454.
9
A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments.一项评估可执行计算表型算法在多个机构和电子健康记录环境中可移植性的案例研究。
J Am Med Inform Assoc. 2018 Nov 1;25(11):1540-1546. doi: 10.1093/jamia/ocy101.
10
Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform.基于电子健康记录数据的机器学习算法预测术后并发症的性能及移动平台报告。
JAMA Netw Open. 2022 May 2;5(5):e2211973. doi: 10.1001/jamanetworkopen.2022.11973.

引用本文的文献

1
Precision phenotyping for curating research cohorts of patients with unexplained post-acute sequelae of COVID-19.用于筛选新型冠状病毒肺炎急性后遗症病因不明患者研究队列的精准表型分析。
Med. 2025 Mar 14;6(3):100532. doi: 10.1016/j.medj.2024.10.009. Epub 2024 Nov 8.
2
Major adverse cardiovascular events' reduction and their association with glucose-lowering medications and glycemic control among patients with type 2 diabetes: A retrospective cohort study using electronic health records.2 型糖尿病患者主要不良心血管事件的减少及其与降糖药物和血糖控制的关系:一项使用电子健康记录的回顾性队列研究。
J Diabetes. 2024 Oct;16(10):e13604. doi: 10.1111/1753-0407.13604.
3

本文引用的文献

1
Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications.为人工智能应用生成合成混合型纵向电子健康记录。
NPJ Digit Med. 2023 May 27;6(1):98. doi: 10.1038/s41746-023-00834-7.
2
Keeping synthetic patients on track: feedback mechanisms to mitigate performance drift in longitudinal health data simulation.保持合成患者的轨迹:反馈机制以减轻纵向健康数据模拟中的性能漂移。
J Am Med Inform Assoc. 2022 Oct 7;29(11):1890-1898. doi: 10.1093/jamia/ocac131.
3
An objective framework for evaluating unrecognized bias in medical AI models predicting COVID-19 outcomes.
Towards cross-application model-agnostic federated cohort discovery.
面向跨应用模型不可知的联邦队列发现。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2202-2209. doi: 10.1093/jamia/ocae211.
评估预测 COVID-19 结果的医疗 AI 模型中未被识别偏见的客观框架。
J Am Med Inform Assoc. 2022 Jul 12;29(8):1334-1341. doi: 10.1093/jamia/ocac070.
4
Impact of longitudinal data-completeness of electronic health record data on risk score misclassification.电子健康记录数据的纵向数据完整性对风险评分分类错误的影响。
J Am Med Inform Assoc. 2022 Jun 14;29(7):1225-1232. doi: 10.1093/jamia/ocac043.
5
A narrative review on the validity of electronic health record-based research in epidemiology.基于电子健康记录的流行病学研究的有效性的叙述性综述。
BMC Med Res Methodol. 2021 Oct 27;21(1):234. doi: 10.1186/s12874-021-01416-5.
6
Imputation of missing values for electronic health record laboratory data.电子健康记录实验室数据缺失值的插补
NPJ Digit Med. 2021 Oct 11;4(1):147. doi: 10.1038/s41746-021-00518-0.
7
Development of a Coronavirus Disease 2019 (COVID-19) Application Ontology for the Accrual to Clinical Trials (ACT) network.为累积到临床试验(ACT)网络开发2019冠状病毒病(COVID-19)应用本体。
JAMIA Open. 2021 Apr 19;4(2):ooab036. doi: 10.1093/jamiaopen/ooab036. eCollection 2021 Apr.
8
Individualized prediction of COVID-19 adverse outcomes with MLHO.用 MLHO 对 COVID-19 不良结局进行个体化预测。
Sci Rep. 2021 Mar 5;11(1):5322. doi: 10.1038/s41598-021-84781-x.
9
Assessing Missing Data Assumptions in EHR-Based Studies: A Complex and Underappreciated Task.评估基于电子健康记录(EHR)研究中的缺失数据假设:一项复杂且未得到充分重视的任务。
JAMA Netw Open. 2021 Feb 1;4(2):e210184. doi: 10.1001/jamanetworkopen.2021.0184.
10
What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.每位读者应该了解的关于使用电子健康记录数据的研究,但可能不敢问的事。
J Med Internet Res. 2021 Mar 2;23(3):e22219. doi: 10.2196/22219.