改进的广义耙式估计器，以解决相关协变量和失效时间结果误差。

Improved generalized raking estimators to address dependent covariate and failure-time outcome error.

机构信息

Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.

Department of Biostatistics, Vanderbilt University, Nashville, TN, USA.

出版信息

Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11.

DOI:10.1002/bimj.202000187

PMID:33709462

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8211389/

Abstract

Biomedical studies that use electronic health records (EHR) data for inference are often subject to bias due to measurement error. The measurement error present in EHR data is typically complex, consisting of errors of unknown functional form in covariates and the outcome, which can be dependent. To address the bias resulting from such errors, generalized raking has recently been proposed as a robust method that yields consistent estimates without the need to model the error structure. We provide rationale for why these previously proposed raking estimators can be expected to be inefficient in failure-time outcome settings involving misclassification of the event indicator. We propose raking estimators that utilize multiple imputation, to impute either the target variables or auxiliary variables, to improve the efficiency. We also consider outcome-dependent sampling designs and investigate their impact on the efficiency of the raking estimators, either with or without multiple imputation. We present an extensive numerical study to examine the performance of the proposed estimators across various measurement error settings. We then apply the proposed methods to our motivating setting, in which we seek to analyze HIV outcomes in an observational cohort with EHR data from the Vanderbilt Comprehensive Care Clinic.

摘要

生物医学研究经常会受到电子健康记录 (EHR) 数据推断中的测量误差的影响。EHR 数据中的测量误差通常很复杂，包括协变量和结果中未知函数形式的误差，并且这些误差可能是相关的。为了解决这些误差引起的偏差，最近提出了广义耙式估计法作为一种稳健的方法，它可以在不需要对误差结构进行建模的情况下得到一致的估计值。我们提供了为什么在涉及事件指标错误分类的失效时间结果设置中，这些之前提出的耙式估计量可能效率低下的原因。我们提出了利用多重插补的耙式估计量，以插补目标变量或辅助变量，以提高效率。我们还考虑了依赖于结果的抽样设计，并研究了它们对耙式估计量效率的影响，无论是有还是没有多重插补。我们进行了广泛的数值研究，以检查各种测量误差设置下提出的估计量的性能。然后，我们将提出的方法应用于我们的动机设置，我们试图在范德比尔特综合护理诊所的 EHR 数据中分析观察队列中的 HIV 结果。

相似文献

Improved generalized raking estimators to address dependent covariate and failure-time outcome error.改进的广义耙式估计器，以解决相关协变量和失效时间结果误差。

Biom J. 2021 Jun;63(5):1006-1027. doi: 10.1002/bimj.202000187. Epub 2021 Mar 11.

Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.耙平和回归校准：解决相关协变量和生存误差偏倚的方法。

Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2.

Three-phase generalized raking and multiple imputation estimators to address error-prone data.三阶段广义耙式和多重插补估计器解决易错数据。

Stat Med. 2024 Jan 30;43(2):379-394. doi: 10.1002/sim.9967. Epub 2023 Nov 21.

Combining multiple imputation with raking of weights: An efficient and robust approach in the setting of nearly true models.结合多重插补和加权排序：在几乎真实模型设定下的有效和稳健方法。

Stat Med. 2021 Dec 30;40(30):6777-6791. doi: 10.1002/sim.9210. Epub 2021 Sep 28.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Optimal sampling for design-based estimators of regression models.基于设计的回归模型估计量的最优抽样。

Stat Med. 2022 Apr 15;41(8):1482-1497. doi: 10.1002/sim.9300. Epub 2022 Jan 6.

Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification.基于电子健康记录的关联研究的统计推断：处理选择偏倚和结局错误分类。

Biometrics. 2022 Mar;78(1):214-226. doi: 10.1111/biom.13400. Epub 2020 Dec 3.

Using audit information to adjust parameter estimates for data errors in clinical trials.利用审核信息调整临床试验中数据错误的参数估计。

Clin Trials. 2012 Dec;9(6):721-9. doi: 10.1177/1740774512450100. Epub 2012 Jul 30.

Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city.大数据与健康调查联合应用的流行率估计：以纽约市电子健康记录为例的示范研究。

BMC Med Res Methodol. 2020 Apr 6;20(1):77. doi: 10.1186/s12874-020-00956-6.

Considerations for analysis of time-to-event outcomes measured with error: Bias and correction with SIMEX.考虑对存在误差的生存时间结局进行分析：SIMEX 法的偏倚和校正。

Stat Med. 2018 Apr 15;37(8):1276-1289. doi: 10.1002/sim.7554. Epub 2017 Nov 29.

引用本文的文献

Ascertainment Conditional Maximum Likelihood for Continuous Outcome Under Two-Phase Response-Selective Design.两阶段反应选择设计下连续结局的确定条件最大似然法

Stat Med. 2025 Jul;44(15-17):e70111. doi: 10.1002/sim.70111.

Combining Straight-Line and Map-Based Distances to Investigate the Connection Between Proximity to Healthy Foods and Disease.结合直线距离和基于地图的距离来研究接近健康食品与疾病之间的联系。

Stat Med. 2025 Mar 30;44(7):e70054. doi: 10.1002/sim.70054.

Optimal multiwave validation of secondary use data with outcome and exposure misclassification.对存在结局和暴露错误分类的二次利用数据进行最优多波验证。

Can J Stat. 2024 Jun;52(2):532-554. doi: 10.1002/cjs.11772. Epub 2023 Mar 31.

Three-phase generalized raking and multiple imputation estimators to address error-prone data.三阶段广义耙式和多重插补估计器解决易错数据。

Stat Med. 2024 Jan 30;43(2):379-394. doi: 10.1002/sim.9967. Epub 2023 Nov 21.

Multiwave validation sampling for error-prone electronic health records.多波验证抽样用于易出错的电子健康记录。

Biometrics. 2023 Sep;79(3):2649-2663. doi: 10.1111/biom.13713. Epub 2022 Jul 11.

Optimal sampling for design-based estimators of regression models.基于设计的回归模型估计量的最优抽样。

Stat Med. 2022 Apr 15;41(8):1482-1497. doi: 10.1002/sim.9300. Epub 2022 Jan 6.

本文引用的文献

Stat Med. 2021 Dec 30;40(30):6777-6791. doi: 10.1002/sim.9210. Epub 2021 Sep 28.

An approximate quasi-likelihood approach for error-prone failure time outcomes and exposures.一种用于有误差的失效时间结局和暴露的近似拟似然方法。

Stat Med. 2021 Oct 15;40(23):5006-5024. doi: 10.1002/sim.9108. Epub 2021 Jun 22.

Two-phase analysis and study design for survival models with error-prone exposures.具有易出错暴露因素的生存模型的两阶段分析与研究设计。

Stat Methods Med Res. 2020 Dec 16:962280220978500. doi: 10.1177/0962280220978500.

Raking and regression calibration: Methods to address bias from correlated covariate and time-to-event error.耙平和回归校准：解决相关协变量和生存误差偏倚的方法。

Stat Med. 2021 Feb 10;40(3):631-649. doi: 10.1002/sim.8793. Epub 2020 Nov 2.

Optimal multiwave sampling for regression modeling in two-phase designs.两阶段设计中回归建模的最优多波抽样

Stat Med. 2020 Dec 30;39(30):4912-4921. doi: 10.1002/sim.8760. Epub 2020 Oct 5.

ACCOUNTING FOR DEPENDENT ERRORS IN PREDICTORS AND TIME-TO-EVENT OUTCOMES USING ELECTRONIC HEALTH RECORDS, VALIDATION SAMPLES, AND MULTIPLE IMPUTATION.利用电子健康记录、验证样本和多重填补法对预测变量和事件发生时间结局中的相关误差进行统计分析

Ann Appl Stat. 2020 Jun;14(2):1045-1061. doi: 10.1214/20-aoas1343. Epub 2020 Jun 29.

A validation sampling approach for consistent estimation of adverse drug reaction risk with misclassified right-censored survival data.一种验证性抽样方法，用于一致估计右删失生存数据中错误分类的药物不良反应风险。

Stat Med. 2018 Nov 30;37(27):3887-3903. doi: 10.1002/sim.7854. Epub 2018 Aug 6.

Considerations for analysis of time-to-event outcomes measured with error: Bias and correction with SIMEX.考虑对存在误差的生存时间结局进行分析：SIMEX 法的偏倚和校正。

Stat Med. 2018 Apr 15;37(8):1276-1289. doi: 10.1002/sim.7554. Epub 2017 Nov 29.

EVALUATING RISK-PREDICTION MODELS USING DATA FROM ELECTRONIC HEALTH RECORDS.使用电子健康记录数据评估风险预测模型

Ann Appl Stat. 2016 Mar;10(1):286-304. doi: 10.1214/15-AOAS891.

Cost-benefit assessment of using electronic health records data for clinical research versus current practices: Contribution of the Electronic Health Records for Clinical Research (EHR4CR) European Project.使用电子健康记录数据进行临床研究与现行做法的成本效益评估：欧洲临床研究电子健康记录（EHR4CR）项目的贡献

Contemp Clin Trials. 2016 Jan;46:85-91. doi: 10.1016/j.cct.2015.11.011. Epub 2015 Nov 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验