Tao Ran, Lotspeich Sarah C, Amorim Gustavo, Shaw Pamela A, Shepherd Bryan E
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Stat Med. 2021 Feb 10;40(3):725-738. doi: 10.1002/sim.8799. Epub 2020 Nov 3.
In modern observational studies using electronic health records or other routinely collected data, both the outcome and covariates of interest can be error-prone and their errors often correlated. A cost-effective solution is the two-phase design, under which the error-prone outcome and covariates are observed for all subjects during the first phase and that information is used to select a validation subsample for accurate measurements of these variables in the second phase. Previous research on two-phase measurement error problems largely focused on scenarios where there are errors in covariates only or the validation sample is a simple random sample of study subjects. Herein, we propose a semiparametric approach to general two-phase measurement error problems with a quantitative outcome, allowing for correlated errors in the outcome and covariates and arbitrary second-phase selection. We devise a computationally efficient and numerically stable expectation-maximization algorithm to maximize the nonparametric likelihood function. The resulting estimators possess desired statistical properties. We demonstrate the superiority of the proposed methods over existing approaches through extensive simulation studies, and we illustrate their use in an observational HIV study.
在使用电子健康记录或其他常规收集数据的现代观察性研究中,感兴趣的结局和协变量都可能容易出错,并且它们的误差通常相互关联。一种具有成本效益的解决方案是两阶段设计,在该设计中,在第一阶段对所有受试者观察容易出错的结局和协变量,并使用该信息选择一个验证子样本,以便在第二阶段对这些变量进行准确测量。先前关于两阶段测量误差问题的研究主要集中在仅协变量存在误差或验证样本是研究对象的简单随机样本的情形。在此,我们提出一种半参数方法来解决具有定量结局的一般两阶段测量误差问题,允许结局和协变量中的误差相关以及任意的第二阶段选择。我们设计了一种计算高效且数值稳定的期望最大化算法来最大化非参数似然函数。所得估计量具有所需的统计性质。我们通过广泛的模拟研究证明了所提出方法相对于现有方法的优越性,并说明了它们在一项观察性HIV研究中的应用。