基于电子病历的分析中治疗反概率加权和混杂因素缺失:使用 Plasmode 模拟比较方法。
Inverse Probability of Treatment Weighting and Confounder Missingness in Electronic Health Record-based Analyses: A Comparison of Approaches Using Plasmode Simulation.
机构信息
From the Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA.
Division of Hematology and Oncology, University of Pennsylvania, Philadelphia, PA.
出版信息
Epidemiology. 2023 Jul 1;34(4):520-530. doi: 10.1097/EDE.0000000000001618. Epub 2023 Apr 26.
BACKGROUND
Electronic health record (EHR) data represent a critical resource for comparative effectiveness research, allowing investigators to study intervention effects in real-world settings with large patient samples. However, high levels of missingness in confounder variables is common, challenging the perceived validity of EHR-based investigations.
METHODS
We investigated performance of multiple imputation and propensity score (PS) calibration when conducting inverse probability of treatment weights (IPTW)-based comparative effectiveness research using EHR data with missingness in confounder variables and outcome misclassification. Our motivating example compared effectiveness of immunotherapy versus chemotherapy treatment of advanced bladder cancer with missingness in a key prognostic variable. We captured complexity in EHR data structures using a plasmode simulation approach to spike investigator-defined effects into resamples of a cohort of 4361 patients from a nationwide deidentified EHR-derived database. We characterized statistical properties of IPTW hazard ratio estimates when using multiple imputation or PS calibration missingness approaches.
RESULTS
Multiple imputation and PS calibration performed similarly, maintaining ≤0.05 absolute bias in the marginal hazard ratio even when ≥50% of subjects had missing at random or missing not at random confounder data. Multiple imputation required greater computational resources, taking nearly 40 times as long as PS calibration to complete. Outcome misclassification minimally increased bias of both methods.
CONCLUSION
Our results support multiple imputation and PS calibration approaches to missingness in missing completely at random or missing at random confounder variables in EHR-based IPTW comparative effectiveness analyses, even with missingness ≥50%. PS calibration represents a computationally efficient alternative to multiple imputation.
背景
电子健康记录 (EHR) 数据是进行比较效果研究的关键资源,使研究人员能够在具有大量患者样本的真实环境中研究干预效果。然而,混杂变量中存在大量缺失值是很常见的,这对基于 EHR 的调查的有效性提出了挑战。
方法
我们研究了在混杂变量和结局存在缺失值的情况下,使用多重填补和倾向评分 (PS) 校准进行基于治疗反概率加权 (IPTW) 的比较效果研究的表现。我们的动机示例比较了免疫疗法与化疗治疗晚期膀胱癌的效果,其中一个关键预后变量存在缺失值。我们使用 plasmode 模拟方法捕获 EHR 数据结构的复杂性,该方法将研究人员定义的效应插入来自全国性去识别 EHR 衍生数据库的 4361 名患者队列的样本中。我们描述了在使用多重填补或 PS 校准缺失值方法时,IPTW 风险比估计的统计性质。
结果
多重填补和 PS 校准表现相似,即使≥50%的受试者存在随机缺失或非随机缺失混杂数据,边际风险比的绝对偏差仍≤0.05。多重填补需要更多的计算资源,完成所需的时间几乎是 PS 校准的 40 倍。结局的错误分类仅略微增加了两种方法的偏差。
结论
我们的结果支持在 EHR 基于 IPTW 的比较效果分析中,对于完全随机缺失或随机缺失混杂变量的缺失值,使用多重填补和 PS 校准方法,即使缺失值≥50%。PS 校准是多重填补的一种计算效率高的替代方法。