缺失数据很重要：缺失电子健康记录数据对比较有效性研究影响的实证评估。

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

出版信息

J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256. doi: 10.1093/jamia/ocad066.

DOI:10.1093/jamia/ocad066

PMID:37337922

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10280351/

Abstract

OBJECTIVES

The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods.

MATERIALS AND METHODS

We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data.

RESULTS

When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression.

DISCUSSION AND CONCLUSION

Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.

摘要

目的

利用电子健康记录（EHR）进行的比较疗效研究（CER）中缺失数据的影响可能因缺失数据的类型和模式而异。在这项研究中，我们旨在量化这些影响并比较不同插补方法的性能。

材料和方法

我们进行了一项实证（模拟）研究，以量化使用 EHR 数据进行 CER 时估计治疗效果的偏差和效力损失。我们考虑了各种缺失情况，并使用倾向评分来控制混杂。我们比较了多重插补和样条平滑方法处理缺失数据的性能。

结果

当缺失数据取决于疾病的随机进展和医疗实践模式时，样条平滑方法产生的结果接近无缺失数据时的结果。与多重插补相比，样条平滑通常表现相似或更好，估计偏差较小，效力损失较小。在某些限制情况下，多重插补仍可以减少研究偏差和效力损失，例如，当缺失数据不依赖于疾病进展的随机过程时。

讨论和结论

即使在缺失数据被插补后，EHR 中的缺失数据仍可能导致 CER 中治疗效果的估计偏差和假阴性结果。在将 EHR 用作 CER 的数据资源时，利用疾病轨迹的时间信息来插补缺失值非常重要，并且在选择插补方法时应考虑缺失率和效应量。

相似文献

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.

J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256. doi: 10.1093/jamia/ocad066.

A comparison of different methods to handle missing data in the context of propensity score analysis.

Eur J Epidemiol. 2019 Jan;34(1):23-36. doi: 10.1007/s10654-018-0447-z. Epub 2018 Oct 19.

Propensity score analysis with partially observed covariates: How should multiple imputation be used?

Stat Methods Med Res. 2019 Jan;28(1):3-19. doi: 10.1177/0962280217713032. Epub 2017 Jun 2.

Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.

BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.

Multiple imputation for handling missing outcome data when estimating the relative risk.

BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.

Inverse Probability of Treatment Weighting and Confounder Missingness in Electronic Health Record-based Analyses: A Comparison of Approaches Using Plasmode Simulation.

Epidemiology. 2023 Jul 1;34(4):520-530. doi: 10.1097/EDE.0000000000001618. Epub 2023 Apr 26.

Comparison of methods for handling covariate missingness in propensity score estimation with a binary exposure.

BMC Med Res Methodol. 2020 Jun 26;20(1):168. doi: 10.1186/s12874-020-01053-4.

Multiple imputation with missing indicators as proxies for unmeasured variables: simulation study.

BMC Med Res Methodol. 2020 Jul 8;20(1):185. doi: 10.1186/s12874-020-01068-x.

Multiple imputation for systematically missing confounders within a distributed data drug safety network: A simulation study and real-world example.

Pharmacoepidemiol Drug Saf. 2020 Jan;29 Suppl 1:35-44. doi: 10.1002/pds.4876. Epub 2019 Sep 4.

Missing Data in Marginal Structural Models: A Plasmode Simulation Study Comparing Multiple Imputation and Inverse Probability Weighting.

Med Care. 2019 Mar;57(3):237-243. doi: 10.1097/MLR.0000000000001063.

引用本文的文献

Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.

Proc Mach Learn Res. 2025 Jun;287:480-501.

Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study.

JMIR Cancer. 2025 Jun 19;11:e64506. doi: 10.2196/64506.

A probabilistic approach for building disease phenotypes across electronic health records.

BioData Min. 2025 Jun 11;18(1):39. doi: 10.1186/s13040-025-00454-9.

Evaluation of the optimal timing for advanced airway management for adult patients with out-of-hospital cardiac arrest: A retrospective observational study from a multicenter registry.

Resusc Plus. 2025 Apr 15;23:100957. doi: 10.1016/j.resplu.2025.100957. eCollection 2025 May.

Fast and interpretable mortality risk scores for critical care patients.

J Am Med Inform Assoc. 2025 Apr 1;32(4):736-747. doi: 10.1093/jamia/ocae318.

V3+ extends the V3 framework to ensure user-centricity and scalability of sensor-based digital health technologies.

NPJ Digit Med. 2025 Jan 24;8(1):51. doi: 10.1038/s41746-024-01322-2.

Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.

Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.

Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models.

J Am Med Inform Assoc. 2024 Nov 1;31(11):2529-2539. doi: 10.1093/jamia/ocae229.

A Customized Human Mitochondrial DNA Database (hMITO DB v1.0) for Rapid Sequence Analysis, Haplotyping and Geo-Mapping.

Int J Mol Sci. 2023 Aug 31;24(17):13505. doi: 10.3390/ijms241713505.

本文引用的文献

Comparative effectiveness of dexamethasone in treatment of hospitalized COVID-19 patients in the United States during the first year of the pandemic: Findings from the National COVID Cohort Collaborative (N3C) data repository.

PLoS One. 2024 Mar 21;19(3):e0294892. doi: 10.1371/journal.pone.0294892. eCollection 2024.

Adjusting for indirectly measured confounding using large-scale propensity score.

J Biomed Inform. 2022 Oct;134:104204. doi: 10.1016/j.jbi.2022.104204. Epub 2022 Sep 13.

COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records.

Lancet Digit Health. 2022 Jul;4(7):e542-e557. doi: 10.1016/S2589-7500(22)00091-7. Epub 2022 Jun 9.

A Process Mining Pipeline to Characterize COVID-19 Patients' Trajectories and Identify Relevant Temporal Phenotypes From EHR Data.

Front Public Health. 2022 May 23;10:815674. doi: 10.3389/fpubh.2022.815674. eCollection 2022.

Informative presence bias in analyses of electronic health records-derived data: a cautionary note.

J Am Med Inform Assoc. 2022 Jun 14;29(7):1191-1199. doi: 10.1093/jamia/ocac050.

A Computational Method for Learning Disease Trajectories From Partially Observable EHR Data.

IEEE J Biomed Health Inform. 2021 Jul;25(7):2476-2486. doi: 10.1109/JBHI.2021.3089441. Epub 2021 Jul 27.

Assessing Missing Data Assumptions in EHR-Based Studies: A Complex and Underappreciated Task.

JAMA Netw Open. 2021 Feb 1;4(2):e210184. doi: 10.1001/jamanetworkopen.2021.0184.

Investigating Bias from Missing Data in an Electronic Health Records-Based Study of Weight Loss After Bariatric Surgery.

Obes Surg. 2021 May;31(5):2125-2135. doi: 10.1007/s11695-021-05226-y. Epub 2021 Jan 19.

High-throughput phenotyping with temporal sequences.

J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review.

J Am Med Inform Assoc. 2020 Nov 1;27(11):1764-1773. doi: 10.1093/jamia/ocaa143.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

缺失数据很重要：缺失电子健康记录数据对比较有效性研究影响的实证评估。

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

DISCUSSION AND CONCLUSION

目的

材料和方法

结果

讨论和结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

缺失数据很重要：缺失电子健康记录数据对比较有效性研究影响的实证评估。

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.

机构信息

出版信息

OBJECTIVES

MATERIALS AND METHODS

RESULTS

DISCUSSION AND CONCLUSION

目的

材料和方法

结果

讨论和结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献