耙平和回归校准：解决相关协变量和生存误差偏倚的方法。

Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.

机构信息

Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA.

出版信息

Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2.

DOI:10.1002/sim.8793

PMID:33140432

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7874496/

Abstract

Medical studies that depend on electronic health records (EHR) data are often subject to measurement error, as the data are not collected to support research questions under study. These data errors, if not accounted for in study analyses, can obscure or cause spurious associations between patient exposures and disease risk. Methodology to address covariate measurement error has been well developed; however, time-to-event error has also been shown to cause significant bias, but methods to address it are relatively underdeveloped. More generally, it is possible to observe errors in both the covariate and the time-to-event outcome that are correlated. We propose regression calibration (RC) estimators to simultaneously address correlated error in the covariates and the censored event time. Although RC can perform well in many settings with covariate measurement error, it is biased for nonlinear regression models, such as the Cox model. Thus, we additionally propose raking estimators which are consistent estimators of the parameter defined by the population estimating equation. Raking can improve upon RC in certain settings with failure-time data, require no explicit modeling of the error structure, and can be utilized under outcome-dependent sampling designs. We discuss features of the underlying estimation problem that affect the degree of improvement the raking estimator has over the RC approach. Detailed simulation studies are presented to examine the performance of the proposed estimators under varying levels of signal, error, and censoring. The methodology is illustrated on observational EHR data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.

摘要

依赖电子健康记录 (EHR) 数据的医学研究通常会受到测量误差的影响，因为这些数据并不是为了支持正在研究的研究问题而收集的。如果在研究分析中没有考虑到这些数据误差，那么患者暴露与疾病风险之间可能会出现模糊或虚假的关联。已经开发了针对协变量测量误差的方法；然而，也已经表明，事件时间的测量误差会导致显著的偏差，但解决该问题的方法相对欠发达。更一般地说，有可能观察到协变量和事件时间结果都存在相关的误差。我们提出了回归校准 (RC) 估计量，以同时解决协变量和删失事件时间中的相关误差。尽管 RC 在许多存在协变量测量误差的情况下表现良好，但对于非线性回归模型（如 Cox 模型）来说存在偏差。因此，我们还提出了耙估计量，这是基于总体估计方程的参数的一致估计量。在某些存在失效时间数据的情况下，耙估计量可以改进 RC，不需要对误差结构进行显式建模，并且可以在依赖结果的抽样设计下使用。我们讨论了影响耙估计量相对于 RC 方法改进程度的基本估计问题的特征。详细的模拟研究检查了所提出的估计器在不同信号、误差和删失水平下的性能。该方法学通过范德比尔特综合护理诊所的 HIV 结果的观察性 EHR 数据进行了说明。

相似文献

Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.

Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2.

Improved generalized raking estimators to address dependent covariate and failure-time outcome error.

Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11.

Considerations for analysis of time-to-event outcomes measured with error: Bias and correction with SIMEX.

Stat Med. 2018 Apr 15;37(8):1276-1289. doi: 10.1002/sim.7554. Epub 2017 Nov 29.

Semiparametric regression calibration for general hazard models in survival analysis with covariate measurement error; surprising performance under linear hazard.

Biometrics. 2021 Jun;77(2):561-572. doi: 10.1111/biom.13318. Epub 2020 Jun 25.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Expected estimating equations via EM for proportional hazards regression with covariate misclassification.

Biostatistics. 2013 Apr;14(2):351-65. doi: 10.1093/biostatistics/kxs046. Epub 2012 Nov 23.

Analysis of composite endpoints with component-wise censoring in the presence of differential visit schedules.

Stat Med. 2022 Apr 30;41(9):1599-1612. doi: 10.1002/sim.9312. Epub 2022 Jan 18.

Regression analysis when covariates are regression parameters of a random effects model for observed longitudinal measurements.

Biometrics. 2000 Jun;56(2):487-95. doi: 10.1111/j.0006-341x.2000.00487.x.

Estimation in the Cox survival regression model with covariate measurement error and a changepoint.

Biom J. 2020 Sep;62(5):1139-1163. doi: 10.1002/bimj.201800085. Epub 2020 Jan 31.

Regression calibration to correct correlated errors in outcome and exposure.

Stat Med. 2021 Jan 30;40(2):271-286. doi: 10.1002/sim.8773. Epub 2020 Oct 21.

引用本文的文献

Combining Straight-Line and Map-Based Distances to Investigate the Connection Between Proximity to Healthy Foods and Disease.

Stat Med. 2025 Mar 30;44(7):e70054. doi: 10.1002/sim.70054.

Optimal multiwave validation of secondary use data with outcome and exposure misclassification.

Can J Stat. 2024 Jun;52(2):532-554. doi: 10.1002/cjs.11772. Epub 2023 Mar 31.

Multivariate longitudinal analysis for the association between brain atrophy and cognitive impairment in prodromal Huntington's disease subjects.

J R Stat Soc Ser C Appl Stat. 2024 Jan;73(1):104-122. doi: 10.1093/jrsssc/qlad087. Epub 2023 Sep 13.

AFFECT: an R package for accelerated functional failure time model with error-contaminated survival times and applications to gene expression data.

BMC Bioinformatics. 2024 Aug 13;25(1):265. doi: 10.1186/s12859-024-05831-5.

Three-phase generalized raking and multiple imputation estimators to address error-prone data.

Stat Med. 2024 Jan 30;43(2):379-394. doi: 10.1002/sim.9967. Epub 2023 Nov 21.

Associations Between Gestational Weight Gain, Gestational Diabetes, and Childhood Obesity Incidence.

Matern Child Health J. 2024 Feb;28(2):372-381. doi: 10.1007/s10995-023-03853-8. Epub 2023 Nov 15.

Combining chains of Bayesian models with Markov melding.

Bayesian Anal. 2022 Jan 1;18(3):807-840. doi: 10.1214/22-BA1327.

An imputation approach for a time-to-event analysis subject to missing outcomes due to noncoverage in disease registries.

Biostatistics. 2023 Dec 15;25(1):117-133. doi: 10.1093/biostatistics/kxac049.

Errors in multiple variables in human immunodeficiency virus (HIV) cohort and electronic health record data: statistical challenges and opportunities.

Stat Commun Infect Dis. 2020 Oct 7;12(Suppl1):20190015. doi: 10.1515/scid-2019-0015. eCollection 2020 Sep 1.

Multiwave validation sampling for error-prone electronic health records.

Biometrics. 2023 Sep;79(3):2649-2663. doi: 10.1111/biom.13713. Epub 2022 Jul 11.

本文引用的文献

Regression calibration to correct correlated errors in outcome and exposure.

Stat Med. 2021 Jan 30;40(2):271-286. doi: 10.1002/sim.8773. Epub 2020 Oct 21.

Considerations for analysis of time-to-event outcomes measured with error: Bias and correction with SIMEX.

Stat Med. 2018 Apr 15;37(8):1276-1289. doi: 10.1002/sim.7554. Epub 2017 Nov 29.

EVALUATING RISK-PREDICTION MODELS USING DATA FROM ELECTRONIC HEALTH RECORDS.

Ann Appl Stat. 2016 Mar;10(1):286-304. doi: 10.1214/15-AOAS891.

Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data.

Am J Epidemiol. 2013 May 1;177(9):904-12. doi: 10.1093/aje/kws340. Epub 2013 Apr 4.

WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING.

Ann Stat. 2013 Feb 1;41(1):269-295. doi: 10.1214/12-AOS1073.

Connections between survey calibration estimators and semiparametric models for incomplete data.

Int Stat Rev. 2011 Aug;79(2):200-220. doi: 10.1111/j.1751-5823.2011.00138.x.

Measuring the quality of observational study data in an international HIV research network.

PLoS One. 2012;7(4):e33908. doi: 10.1371/journal.pone.0033908. Epub 2012 Apr 6.

Hazard ratio estimation for biomarker-calibrated dietary exposures.

Biometrics. 2012 Jun;68(2):397-407. doi: 10.1111/j.1541-0420.2011.01690.x. Epub 2011 Oct 17.

Accounting for data errors discovered from an audit in multiple linear regression.

Biometrics. 2011 Sep;67(3):1083-91. doi: 10.1111/j.1541-0420.2010.01543.x. Epub 2011 Jan 31.

Analysis of progression-free survival data using a discrete time survival model that incorporates measurements with and without diagnostic error.

Clin Trials. 2010 Dec;7(6):634-42. doi: 10.1177/1740774510384887. Epub 2010 Nov 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

耙平和回归校准：解决相关协变量和生存误差偏倚的方法。

Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献