一种用于比较效果研究的识别电子健康记录中数据完整性高的患者的算法的外部验证

External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research.

作者信息

Lin Kueiyu Joshua, Rosenthal Gary E, Murphy Shawn N, Mandl Kenneth D, Jin Yinzhu, Glynn Robert J, Schneeweiss Sebastian

机构信息

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

出版信息

Clin Epidemiol. 2020 Feb 4;12:133-141. doi: 10.2147/CLEP.S232540. eCollection 2020.

DOI:10.2147/CLEP.S232540

PMID:32099479

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7007793/

Abstract

OBJECTIVE

Electronic health records (EHR) data-discontinuity, i.e. receiving care outside of a particular EHR system, may cause misclassification of study variables. We aimed to validate an algorithm to identify patients with high EHR data-continuity to reduce such bias.

MATERIALS AND METHODS

We analyzed data from two EHR systems linked with Medicare claims data from 2007 through 2014, one in Massachusetts (MA, n=80,588) and the other in North Carolina (NC, n=33,207). We quantified EHR data-continuity by Mean Proportion of Encounters Captured (MPEC) by the EHR system when compared to complete recording in claims data. The prediction model for MPEC was developed in MA and validated in NC. Stratified by predicted EHR data-continuity, we quantified misclassification of 40 key variables by Mean Standardized Differences (MSD) between the proportions of these variables based on EHR alone vs the linked claims-EHR data.

RESULTS

The mean MPEC was 27% in the MA and 26% in the NC system. The predicted and observed EHR data-continuity was highly correlated (Spearman correlation=0.78 and 0.73, respectively). The misclassification (MSD) of 40 variables in patients of the predicted EHR data-continuity cohort was significantly smaller (44%, 95% CI: 40-48%) than that in the remaining population.

DISCUSSION

The comorbidity profiles were similar in patients with high vs low EHR data-continuity. Therefore, restricting an analysis to patients with high EHR data-continuity may reduce information bias while preserving the representativeness of the study cohort.

CONCLUSION

We have successfully validated an algorithm that can identify a high EHR data-continuity cohort representative of the source population.

摘要

目的

电子健康记录（EHR）数据不连续性，即患者在特定EHR系统之外接受治疗，可能导致研究变量的错误分类。我们旨在验证一种算法，以识别具有高EHR数据连续性的患者，从而减少此类偏差。

材料与方法

我们分析了两个与2007年至2014年医疗保险索赔数据相关联的EHR系统的数据，一个在马萨诸塞州（MA，n = 80,588），另一个在北卡罗来纳州（NC，n = 33,207）。与索赔数据中的完整记录相比，我们通过EHR系统的捕获就诊平均比例（MPEC）来量化EHR数据连续性。MPEC的预测模型在MA开发，并在NC进行验证。根据预测的EHR数据连续性进行分层，我们通过仅基于EHR的这些变量比例与关联的索赔-EHR数据之间的平均标准化差异（MSD）来量化40个关键变量的错误分类。

结果

MA系统的平均MPEC为27%，NC系统为26%。预测的和观察到的EHR数据连续性高度相关（Spearman相关性分别为0.78和0.73）。预测的EHR数据连续性队列患者中40个变量的错误分类（MSD）明显小于其余人群（44%，95%CI：40 - 48%）。

讨论

EHR数据连续性高与低的患者合并症情况相似。因此，将分析限制在EHR数据连续性高的患者中，可能会减少信息偏差，同时保持研究队列的代表性。

结论

我们成功验证了一种算法，该算法可以识别代表源人群的高EHR数据连续性队列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8afa/7007793/64c7075cb117/CLEP-12-133-g0001.jpg

相似文献

External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research.一种用于比较效果研究的识别电子健康记录中数据完整性高的患者的算法的外部验证

Clin Epidemiol. 2020 Feb 4;12:133-141. doi: 10.2147/CLEP.S232540. eCollection 2020.

Identifying Patients With High Data Completeness to Improve Validity of Comparative Effectiveness Research in Electronic Health Records Data.确定数据完整性高的患者，以提高电子健康记录数据中比较有效性研究的有效性。

Clin Pharmacol Ther. 2018 May;103(5):899-905. doi: 10.1002/cpt.861. Epub 2017 Oct 10.

Out-of-system Care and Recording of Patient Characteristics Critical for Comparative Effectiveness Research.系统外医疗护理和患者特征记录对比较疗效研究至关重要。

Epidemiology. 2018 May;29(3):356-363. doi: 10.1097/EDE.0000000000000794.

An algorithm to predict data completeness in oncology electronic medical records for comparative effectiveness research.用于比较疗效研究的肿瘤电子病历数据完整性预测算法。

Ann Epidemiol. 2022 Dec;76:143-149. doi: 10.1016/j.annepidem.2022.07.007. Epub 2022 Jul 23.

Comparison of EHR Data-Completeness in Patients with Different Types of Medical Insurance Coverage in the United States.美国不同类型医疗保险覆盖的患者电子健康记录数据完整性比较。

Clin Pharmacol Ther. 2023 Nov;114(5):1116-1125. doi: 10.1002/cpt.3027. Epub 2023 Sep 1.

The Impact of Longitudinal Data-Completeness of Electronic Health Record Data on the Prediction Performance of Clinical Risk Scores.电子健康记录数据的纵向数据完整性对临床风险评分预测性能的影响。

Clin Pharmacol Ther. 2023 Jun;113(6):1359-1367. doi: 10.1002/cpt.2901. Epub 2023 May 4.

Advancing an Algorithm for the Identification of Patients with High Data-Continuity in Electronic Health Records.推进一种用于识别电子健康记录中具有高数据连续性患者的算法。

Clin Epidemiol. 2022 Nov 8;14:1339-1349. doi: 10.2147/CLEP.S370031. eCollection 2022.

Impact of longitudinal data-completeness of electronic health record data on risk score misclassification.电子健康记录数据的纵向数据完整性对风险评分分类错误的影响。

J Am Med Inform Assoc. 2022 Jun 14;29(7):1225-1232. doi: 10.1093/jamia/ocac043.

Longitudinal Data Discontinuity in Electronic Health Records and Consequences for Medication Effectiveness Studies.电子健康记录中的纵向数据不连续性及其对药物有效性研究的影响。

Clin Pharmacol Ther. 2022 Jan;111(1):243-251. doi: 10.1002/cpt.2400. Epub 2021 Sep 20.

The impact of electronic health record discontinuity on prediction modeling.电子健康记录不连续性对预测建模的影响。

PLoS One. 2023 Jul 6;18(7):e0287985. doi: 10.1371/journal.pone.0287985. eCollection 2023.

引用本文的文献

Establishing a Validation Framework of Treatment Discontinuation in Claims Data Using Natural Language Processing and Electronic Health Records.利用自然语言处理和电子健康记录在索赔数据中建立治疗中断的验证框架。

Clin Pharmacol Ther. 2025 Apr 8. doi: 10.1002/cpt.3650.

Tailoring Risk Prediction Models to Local Populations.针对当地人群定制风险预测模型。

JAMA Cardiol. 2024 Nov 1;9(11):1018-1028. doi: 10.1001/jamacardio.2024.2912.

Making causal inferences from transactional data: A narrative review of opportunities and challenges when implementing the target trial framework.从交易数据中进行因果推断：实施目标试验框架时的机遇与挑战述评。

J Int Med Res. 2024 Mar;52(3):3000605241241920. doi: 10.1177/03000605241241920.

Identifying Functional Status Impairment in People Living With Dementia Through Natural Language Processing of Clinical Documents: Cross-Sectional Study.通过对临床文档的自然语言处理识别痴呆患者的功能状态障碍：横断面研究。

J Med Internet Res. 2024 Feb 13;26:e47739. doi: 10.2196/47739.

Digital Determinants of Health: Health data poverty amplifies existing health disparities-A scoping review.健康的数字决定因素：健康数据贫困加剧了现有的健康差距——一项范围综述。

PLOS Digit Health. 2023 Oct 12;2(10):e0000313. doi: 10.1371/journal.pdig.0000313. eCollection 2023 Oct.

A broadly applicable approach to enrich electronic-health-record cohorts by identifying patients with complete data: a multisite evaluation.一种通过识别具有完整数据的患者来丰富电子健康记录队列的广泛适用方法：多站点评估。

J Am Med Inform Assoc. 2023 Nov 17;30(12):1985-1994. doi: 10.1093/jamia/ocad166.

Clin Pharmacol Ther. 2023 Nov;114(5):1116-1125. doi: 10.1002/cpt.3027. Epub 2023 Sep 1.

Clin Pharmacol Ther. 2023 Jun;113(6):1359-1367. doi: 10.1002/cpt.2901. Epub 2023 May 4.

Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer's disease and related dementias.通过自然语言处理评估电子健康记录中的认知测试和生物标志物文档，用于阿尔茨海默病及相关痴呆症。

Int J Med Inform. 2023 Feb;170:104973. doi: 10.1016/j.ijmedinf.2022.104973. Epub 2022 Dec 21.

Advancing an Algorithm for the Identification of Patients with High Data-Continuity in Electronic Health Records.推进一种用于识别电子健康记录中具有高数据连续性患者的算法。

Clin Epidemiol. 2022 Nov 8;14:1339-1349. doi: 10.2147/CLEP.S370031. eCollection 2022.

本文引用的文献

Out-of-system Care and Recording of Patient Characteristics Critical for Comparative Effectiveness Research.系统外医疗护理和患者特征记录对比较疗效研究至关重要。

Epidemiology. 2018 May;29(3):356-363. doi: 10.1097/EDE.0000000000000794.

Validity of Using Inpatient and Outpatient Administrative Codes to Identify Acute Venous Thromboembolism: The CVRN VTE Study.使用住院和门诊管理代码识别急性静脉血栓栓塞的有效性：CVRN VTE研究

Med Care. 2017 Dec;55(12):e137-e143. doi: 10.1097/MLR.0000000000000524.

Biases introduced by filtering electronic health records for patients with "complete data".通过筛选具有“完整数据”的患者的电子健康记录所引入的偏差。

J Am Med Inform Assoc. 2017 Nov 1;24(6):1134-1141. doi: 10.1093/jamia/ocx071.

Clin Pharmacol Ther. 2018 May;103(5):899-905. doi: 10.1002/cpt.861. Epub 2017 Oct 10.

Building Data Infrastructure to Evaluate and Improve Quality: PCORnet.构建用于评估和提高质量的数据基础设施：PCORnet。

J Oncol Pract. 2015 May;11(3):204-6. doi: 10.1200/JOP.2014.003194.

Building electronic data infrastructure for comparative effectiveness research: accomplishments, lessons learned and future steps.构建用于比较效果研究的电子数据基础设施：成就、经验教训及未来举措。

J Comp Eff Res. 2014 Nov;3(6):567-72. doi: 10.2217/cer.14.73.

Metrics for covariate balance in cohort studies of causal effects.协变量平衡的度量在因果效应的队列研究中。

Stat Med. 2014 May 10;33(10):1685-99. doi: 10.1002/sim.6058. Epub 2013 Dec 9.

A systematic review of validated methods for identifying venous thromboembolism using administrative and claims data.使用行政和索赔数据识别静脉血栓栓塞症的验证方法的系统评价。

Pharmacoepidemiol Drug Saf. 2012 Jan;21 Suppl 1:154-62. doi: 10.1002/pds.2341.

A systematic review of validated methods for identifying cerebrovascular accident or transient ischemic attack using administrative data.使用行政数据识别脑血管意外或短暂性脑缺血发作的验证方法的系统评价。

Pharmacoepidemiol Drug Saf. 2012 Jan;21 Suppl 1(Suppl 1):100-28. doi: 10.1002/pds.2312.

An automated database case definition for serious bleeding related to oral anticoagulant use.一种用于口服抗凝剂相关严重出血的自动化数据库病例定义。

Pharmacoepidemiol Drug Saf. 2011 Jun;20(6):560-6. doi: 10.1002/pds.2109. Epub 2011 Mar 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于比较效果研究的识别电子健康记录中数据完整性高的患者的算法的外部验证

External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献