Suppr超能文献

缺失数据很重要:缺失电子健康记录数据对比较有效性研究影响的实证评估。

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

出版信息

J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256. doi: 10.1093/jamia/ocad066.

Abstract

OBJECTIVES

The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods.

MATERIALS AND METHODS

We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data.

RESULTS

When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression.

DISCUSSION AND CONCLUSION

Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.

摘要

目的

利用电子健康记录(EHR)进行的比较疗效研究(CER)中缺失数据的影响可能因缺失数据的类型和模式而异。在这项研究中,我们旨在量化这些影响并比较不同插补方法的性能。

材料和方法

我们进行了一项实证(模拟)研究,以量化使用 EHR 数据进行 CER 时估计治疗效果的偏差和效力损失。我们考虑了各种缺失情况,并使用倾向评分来控制混杂。我们比较了多重插补和样条平滑方法处理缺失数据的性能。

结果

当缺失数据取决于疾病的随机进展和医疗实践模式时,样条平滑方法产生的结果接近无缺失数据时的结果。与多重插补相比,样条平滑通常表现相似或更好,估计偏差较小,效力损失较小。在某些限制情况下,多重插补仍可以减少研究偏差和效力损失,例如,当缺失数据不依赖于疾病进展的随机过程时。

讨论和结论

即使在缺失数据被插补后,EHR 中的缺失数据仍可能导致 CER 中治疗效果的估计偏差和假阴性结果。在将 EHR 用作 CER 的数据资源时,利用疾病轨迹的时间信息来插补缺失值非常重要,并且在选择插补方法时应考虑缺失率和效应量。

相似文献

引用本文的文献

本文引用的文献

9
High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验