Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA.
Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2.
Medical studies that depend on electronic health records (EHR) data are often subject to measurement error, as the data are not collected to support research questions under study. These data errors, if not accounted for in study analyses, can obscure or cause spurious associations between patient exposures and disease risk. Methodology to address covariate measurement error has been well developed; however, time-to-event error has also been shown to cause significant bias, but methods to address it are relatively underdeveloped. More generally, it is possible to observe errors in both the covariate and the time-to-event outcome that are correlated. We propose regression calibration (RC) estimators to simultaneously address correlated error in the covariates and the censored event time. Although RC can perform well in many settings with covariate measurement error, it is biased for nonlinear regression models, such as the Cox model. Thus, we additionally propose raking estimators which are consistent estimators of the parameter defined by the population estimating equation. Raking can improve upon RC in certain settings with failure-time data, require no explicit modeling of the error structure, and can be utilized under outcome-dependent sampling designs. We discuss features of the underlying estimation problem that affect the degree of improvement the raking estimator has over the RC approach. Detailed simulation studies are presented to examine the performance of the proposed estimators under varying levels of signal, error, and censoring. The methodology is illustrated on observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.
依赖电子健康记录 (EHR) 数据的医学研究通常会受到测量误差的影响,因为这些数据并不是为了支持正在研究的研究问题而收集的。如果在研究分析中没有考虑到这些数据误差,那么患者暴露与疾病风险之间可能会出现模糊或虚假的关联。已经开发了针对协变量测量误差的方法;然而,也已经表明,事件时间的测量误差会导致显著的偏差,但解决该问题的方法相对欠发达。更一般地说,有可能观察到协变量和事件时间结果都存在相关的误差。我们提出了回归校准 (RC) 估计量,以同时解决协变量和删失事件时间中的相关误差。尽管 RC 在许多存在协变量测量误差的情况下表现良好,但对于非线性回归模型(如 Cox 模型)来说存在偏差。因此,我们还提出了耙估计量,这是基于总体估计方程的参数的一致估计量。在某些存在失效时间数据的情况下,耙估计量可以改进 RC,不需要对误差结构进行显式建模,并且可以在依赖结果的抽样设计下使用。我们讨论了影响耙估计量相对于 RC 方法改进程度的基本估计问题的特征。详细的模拟研究检查了所提出的估计器在不同信号、误差和删失水平下的性能。该方法学通过范德比尔特综合护理诊所的 HIV 结果的观察性 EHR 数据进行了说明。