• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

缺失数据很重要:缺失电子健康记录数据对比较有效性研究影响的实证评估。

Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.

出版信息

J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256. doi: 10.1093/jamia/ocad066.

DOI:10.1093/jamia/ocad066
PMID:37337922
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10280351/
Abstract

OBJECTIVES

The impacts of missing data in comparative effectiveness research (CER) using electronic health records (EHRs) may vary depending on the type and pattern of missing data. In this study, we aimed to quantify these impacts and compare the performance of different imputation methods.

MATERIALS AND METHODS

We conducted an empirical (simulation) study to quantify the bias and power loss in estimating treatment effects in CER using EHR data. We considered various missing scenarios and used the propensity scores to control for confounding. We compared the performance of the multiple imputation and spline smoothing methods to handle missing data.

RESULTS

When missing data depended on the stochastic progression of disease and medical practice patterns, the spline smoothing method produced results that were close to those obtained when there were no missing data. Compared to multiple imputation, the spline smoothing generally performed similarly or better, with smaller estimation bias and less power loss. The multiple imputation can still reduce study bias and power loss in some restrictive scenarios, eg, when missing data did not depend on the stochastic process of disease progression.

DISCUSSION AND CONCLUSION

Missing data in EHRs could lead to biased estimates of treatment effects and false negative findings in CER even after missing data were imputed. It is important to leverage the temporal information of disease trajectory to impute missing values when using EHRs as a data resource for CER and to consider the missing rate and the effect size when choosing an imputation method.

摘要

目的

利用电子健康记录(EHR)进行的比较疗效研究(CER)中缺失数据的影响可能因缺失数据的类型和模式而异。在这项研究中,我们旨在量化这些影响并比较不同插补方法的性能。

材料和方法

我们进行了一项实证(模拟)研究,以量化使用 EHR 数据进行 CER 时估计治疗效果的偏差和效力损失。我们考虑了各种缺失情况,并使用倾向评分来控制混杂。我们比较了多重插补和样条平滑方法处理缺失数据的性能。

结果

当缺失数据取决于疾病的随机进展和医疗实践模式时,样条平滑方法产生的结果接近无缺失数据时的结果。与多重插补相比,样条平滑通常表现相似或更好,估计偏差较小,效力损失较小。在某些限制情况下,多重插补仍可以减少研究偏差和效力损失,例如,当缺失数据不依赖于疾病进展的随机过程时。

讨论和结论

即使在缺失数据被插补后,EHR 中的缺失数据仍可能导致 CER 中治疗效果的估计偏差和假阴性结果。在将 EHR 用作 CER 的数据资源时,利用疾病轨迹的时间信息来插补缺失值非常重要,并且在选择插补方法时应考虑缺失率和效应量。

相似文献

1
Missing data matter: an empirical evaluation of the impacts of missing EHR data in comparative effectiveness research.缺失数据很重要:缺失电子健康记录数据对比较有效性研究影响的实证评估。
J Am Med Inform Assoc. 2023 Jun 20;30(7):1246-1256. doi: 10.1093/jamia/ocad066.
2
A comparison of different methods to handle missing data in the context of propensity score analysis.不同方法在倾向评分分析中处理缺失数据的比较。
Eur J Epidemiol. 2019 Jan;34(1):23-36. doi: 10.1007/s10654-018-0447-z. Epub 2018 Oct 19.
3
Propensity score analysis with partially observed covariates: How should multiple imputation be used?倾向评分分析与部分观测协变量:应如何使用多重插补?
Stat Methods Med Res. 2019 Jan;28(1):3-19. doi: 10.1177/0962280217713032. Epub 2017 Jun 2.
4
Imputation strategies when a continuous outcome is to be dichotomized for responder analysis: a simulation study.当连续结果需要二分类化进行应答者分析时的推断策略:一项模拟研究。
BMC Med Res Methodol. 2019 Jul 23;19(1):161. doi: 10.1186/s12874-019-0793-x.
5
Multiple imputation for handling missing outcome data when estimating the relative risk.采用多重插补处理估计相对危险度时丢失的结局数据。
BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5.
6
Inverse Probability of Treatment Weighting and Confounder Missingness in Electronic Health Record-based Analyses: A Comparison of Approaches Using Plasmode Simulation.基于电子病历的分析中治疗反概率加权和混杂因素缺失:使用 Plasmode 模拟比较方法。
Epidemiology. 2023 Jul 1;34(4):520-530. doi: 10.1097/EDE.0000000000001618. Epub 2023 Apr 26.
7
Comparison of methods for handling covariate missingness in propensity score estimation with a binary exposure.比较处理二分类暴露因素倾向性评分估计中协变量缺失的方法。
BMC Med Res Methodol. 2020 Jun 26;20(1):168. doi: 10.1186/s12874-020-01053-4.
8
Multiple imputation with missing indicators as proxies for unmeasured variables: simulation study.缺失指标的多重插补作为未测量变量的代理:模拟研究。
BMC Med Res Methodol. 2020 Jul 8;20(1):185. doi: 10.1186/s12874-020-01068-x.
9
Multiple imputation for systematically missing confounders within a distributed data drug safety network: A simulation study and real-world example.在分布式数据药物安全网络中针对系统缺失混杂因素进行多重插补:一项模拟研究和真实世界的实例。
Pharmacoepidemiol Drug Saf. 2020 Jan;29 Suppl 1:35-44. doi: 10.1002/pds.4876. Epub 2019 Sep 4.
10
Missing Data in Marginal Structural Models: A Plasmode Simulation Study Comparing Multiple Imputation and Inverse Probability Weighting.边缘结构模型中的缺失数据:比较多种插补和逆概率加权的 Plasmode 模拟研究。
Med Care. 2019 Mar;57(3):237-243. doi: 10.1097/MLR.0000000000001063.

引用本文的文献

1
Benchmarking Missing Data Imputation Methods for Time Series Using Real-World Test Cases.使用实际测试案例对时间序列的缺失数据插补方法进行基准测试。
Proc Mach Learn Res. 2025 Jun;287:480-501.
2
Predicting Early-Onset Colorectal Cancer in Individuals Below Screening Age Using Machine Learning and Real-World Data: Case Control Study.利用机器学习和真实世界数据预测筛查年龄以下个体的早发性结直肠癌:病例对照研究
JMIR Cancer. 2025 Jun 19;11:e64506. doi: 10.2196/64506.
3
A probabilistic approach for building disease phenotypes across electronic health records.一种基于电子健康记录构建疾病表型的概率方法。
BioData Min. 2025 Jun 11;18(1):39. doi: 10.1186/s13040-025-00454-9.
4
Evaluation of the optimal timing for advanced airway management for adult patients with out-of-hospital cardiac arrest: A retrospective observational study from a multicenter registry.院外心脏骤停成年患者高级气道管理最佳时机的评估:一项来自多中心登记处的回顾性观察研究。
Resusc Plus. 2025 Apr 15;23:100957. doi: 10.1016/j.resplu.2025.100957. eCollection 2025 May.
5
Fast and interpretable mortality risk scores for critical care patients.针对重症监护患者的快速且可解释的死亡风险评分
J Am Med Inform Assoc. 2025 Apr 1;32(4):736-747. doi: 10.1093/jamia/ocae318.
6
V3+ extends the V3 framework to ensure user-centricity and scalability of sensor-based digital health technologies.V3+扩展了V3框架,以确保基于传感器的数字健康技术以用户为中心并具备可扩展性。
NPJ Digit Med. 2025 Jan 24;8(1):51. doi: 10.1038/s41746-024-01322-2.
7
Moving Beyond Medical Statistics: A Systematic Review on Missing Data Handling in Electronic Health Records.超越医学统计学:电子健康记录中缺失数据处理的系统评价
Health Data Sci. 2024 Dec 4;4:0176. doi: 10.34133/hds.0176. eCollection 2024.
8
Reliable generation of privacy-preserving synthetic electronic health record time series via diffusion models.通过扩散模型可靠地生成隐私保护的合成电子健康记录时间序列。
J Am Med Inform Assoc. 2024 Nov 1;31(11):2529-2539. doi: 10.1093/jamia/ocae229.
9
A Customized Human Mitochondrial DNA Database (hMITO DB v1.0) for Rapid Sequence Analysis, Haplotyping and Geo-Mapping.一个定制的人类线粒体 DNA 数据库(hMITO DB v1.0),用于快速序列分析、单倍型分析和地理映射。
Int J Mol Sci. 2023 Aug 31;24(17):13505. doi: 10.3390/ijms241713505.

本文引用的文献

1
Comparative effectiveness of dexamethasone in treatment of hospitalized COVID-19 patients in the United States during the first year of the pandemic: Findings from the National COVID Cohort Collaborative (N3C) data repository.在大流行的第一年,美国住院 COVID-19 患者中地塞米松治疗的疗效比较:来自国家 COVID 队列协作(N3C)数据存储库的研究结果。
PLoS One. 2024 Mar 21;19(3):e0294892. doi: 10.1371/journal.pone.0294892. eCollection 2024.
2
Adjusting for indirectly measured confounding using large-scale propensity score.利用大规模倾向评分调整间接测量混杂。
J Biomed Inform. 2022 Oct;134:104204. doi: 10.1016/j.jbi.2022.104204. Epub 2022 Sep 13.
3
COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records.英格兰 5700 万成年人的 COVID-19 病程:一项使用电子健康记录的队列研究。
Lancet Digit Health. 2022 Jul;4(7):e542-e557. doi: 10.1016/S2589-7500(22)00091-7. Epub 2022 Jun 9.
4
A Process Mining Pipeline to Characterize COVID-19 Patients' Trajectories and Identify Relevant Temporal Phenotypes From EHR Data.从电子健康记录数据中提取特征以描绘 COVID-19 患者轨迹并识别相关时间表型的流程挖掘管道。
Front Public Health. 2022 May 23;10:815674. doi: 10.3389/fpubh.2022.815674. eCollection 2022.
5
Informative presence bias in analyses of electronic health records-derived data: a cautionary note.电子健康记录衍生数据分析中的信息性存在偏差:一则警示
J Am Med Inform Assoc. 2022 Jun 14;29(7):1191-1199. doi: 10.1093/jamia/ocac050.
6
A Computational Method for Learning Disease Trajectories From Partially Observable EHR Data.一种从部分可观察的电子健康记录数据中学习疾病轨迹的计算方法。
IEEE J Biomed Health Inform. 2021 Jul;25(7):2476-2486. doi: 10.1109/JBHI.2021.3089441. Epub 2021 Jul 27.
7
Assessing Missing Data Assumptions in EHR-Based Studies: A Complex and Underappreciated Task.评估基于电子健康记录(EHR)研究中的缺失数据假设:一项复杂且未得到充分重视的任务。
JAMA Netw Open. 2021 Feb 1;4(2):e210184. doi: 10.1001/jamanetworkopen.2021.0184.
8
Investigating Bias from Missing Data in an Electronic Health Records-Based Study of Weight Loss After Bariatric Surgery.基于电子健康记录的减重手术减肥效果研究中缺失数据导致的偏倚分析。
Obes Surg. 2021 May;31(5):2125-2135. doi: 10.1007/s11695-021-05226-y. Epub 2021 Jan 19.
9
High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.
10
Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review.电子健康记录中的健康社会决定因素及其对分析和风险预测的影响:系统评价。
J Am Med Inform Assoc. 2020 Nov 1;27(11):1764-1773. doi: 10.1093/jamia/ocaa143.