Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Stat Med. 2021 Oct 15;40(23):5006-5024. doi: 10.1002/sim.9108. Epub 2021 Jun 22.
Measurement error arises commonly in clinical research settings that rely on data from electronic health records or large observational cohorts. In particular, self-reported outcomes are typical in cohort studies for chronic diseases such as diabetes in order to avoid the burden of expensive diagnostic tests. Dietary intake, which is also commonly collected by self-report and subject to measurement error, is a major factor linked to diabetes and other chronic diseases. These errors can bias exposure-disease associations that ultimately can mislead clinical decision-making. We have extended an existing semiparametric likelihood-based method for handling error-prone, discrete failure time outcomes to also address covariate error. We conduct an extensive numerical study to compare the proposed method to the naive approach that ignores measurement error in terms of bias and efficiency in the estimation of the regression parameter of interest. In all settings considered, the proposed method showed minimal bias and maintained coverage probability, thus outperforming the naive analysis which showed extreme bias and low coverage. This method is applied to data from the Women's Health Initiative to assess the association between energy and protein intake and the risk of incident diabetes mellitus. Our results show that correcting for errors in both the self-reported outcome and dietary exposures leads to considerably different hazard ratio estimates than those from analyses that ignore measurement error, which demonstrates the importance of correcting for both outcome and covariate error.
在依赖电子健康记录或大型观察队列中数据的临床研究环境中,通常会出现测量误差。特别是,在糖尿病等慢性病的队列研究中,通常采用自我报告的结局来避免昂贵的诊断测试的负担。饮食摄入也通常通过自我报告收集,并受到测量误差的影响,是与糖尿病和其他慢性病相关的主要因素。这些误差会使暴露与疾病的关联产生偏差,最终可能导致临床决策失误。我们已经扩展了一种现有的基于半参数似然的方法,用于处理易出错的离散失效时间结局,以解决协变量误差问题。我们进行了广泛的数值研究,比较了所提出的方法与简单方法(忽略测量误差),以评估回归参数的偏差和效率。在所考虑的所有情况下,与简单分析相比,所提出的方法显示出最小的偏差和保持覆盖概率,从而优于简单分析,简单分析显示出极端的偏差和低覆盖率。该方法应用于妇女健康倡议的数据,以评估能量和蛋白质摄入与新发糖尿病风险之间的关联。我们的结果表明,纠正自我报告结局和饮食暴露中的误差会导致与忽略测量误差的分析相比,风险比估计值有很大差异,这表明纠正结局和协变量误差都很重要。