评估缺失协变量的风险模型校准。

Assessing risk model calibration with missing covariates.

机构信息

Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA.

出版信息

Biostatistics. 2022 Jul 18;23(3):875-890. doi: 10.1093/biostatistics/kxaa060.

DOI:10.1093/biostatistics/kxaa060

PMID:33616159

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9608650/

Abstract

When validating a risk model in an independent cohort, some predictors may be missing for some subjects. Missingness can be unplanned or by design, as in case-cohort or nested case-control studies, in which some covariates are measured only in subsampled subjects. Weighting methods and imputation are used to handle missing data. We propose methods to increase the efficiency of weighting to assess calibration of a risk model (i.e. bias in model predictions), which is quantified by the ratio of the number of observed events, $\mathcal{O}$, to expected events, $\mathcal{E}$, computed from the model. We adjust known inverse probability weights by incorporating auxiliary information available for all cohort members. We use survey calibration that requires the weighted sum of the auxiliary statistics in the complete data subset to equal their sum in the full cohort. We show that a pseudo-risk estimate that approximates the actual risk value but uses only variables available for the entire cohort is an excellent auxiliary statistic to estimate $\mathcal{E}$. We derive analytic variance formulas for $\mathcal{O}/\mathcal{E}$ with adjusted weights. In simulations, weight adjustment with pseudo-risk was much more efficient than inverse probability weighting and yielded consistent estimates even when the pseudo-risk was a poor approximation. Multiple imputation was often efficient but yielded biased estimates when the imputation model was misspecified. Using these methods, we assessed calibration of an absolute risk model for second primary thyroid cancer in an independent cohort.

摘要

在独立队列中验证风险模型时，对于某些受试者，某些预测因子可能会缺失。缺失可能是计划外的，也可能是出于设计目的，如病例队列或巢式病例对照研究，其中仅对部分抽样受试者测量了某些协变量。可以使用加权方法和插补来处理缺失数据。我们提出了一些方法来提高加权效率，以评估风险模型的校准（即模型预测的偏差），这可以通过从模型计算的观察到的事件数 $\mathcal{O}$ 与预期事件数 $\mathcal{E}$ 的比值来量化。我们通过结合所有队列成员可用的辅助信息来调整已知的逆概率权重。我们使用需要加权完整数据子集的辅助统计量的和等于其在整个队列中的和的调查校准。我们表明，一种近似实际风险值但仅使用整个队列中可用的变量的伪风险估计是估计 $\mathcal{E}$ 的极好辅助统计量。我们推导出了带有调整权重的 $\mathcal{O}/\mathcal{E}$ 的解析方差公式。在模拟中，使用伪风险进行权重调整比逆概率加权更有效，即使伪风险的近似值较差，也能得到一致的估计。当插补模型指定错误时，多次插补通常是有效的，但会产生有偏估计。我们使用这些方法评估了在独立队列中第二原发甲状腺癌的绝对风险模型的校准。

相似文献

Assessing risk model calibration with missing covariates.

Biostatistics. 2022 Jul 18;23(3):875-890. doi: 10.1093/biostatistics/kxaa060.

Propensity score analysis with partially observed covariates: How should multiple imputation be used?

Stat Methods Med Res. 2019 Jan;28(1):3-19. doi: 10.1177/0962280217713032. Epub 2017 Jun 2.

Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome.

BMC Med Res Methodol. 2022 Apr 3;22(1):87. doi: 10.1186/s12874-021-01495-4.

Inverse Probability of Treatment Weighting and Confounder Missingness in Electronic Health Record-based Analyses: A Comparison of Approaches Using Plasmode Simulation.

Epidemiology. 2023 Jul 1;34(4):520-530. doi: 10.1097/EDE.0000000000001618. Epub 2023 Apr 26.

Weight calibration to improve the efficiency of pure risk estimates from case-control samples nested in a cohort.

Biometrics. 2020 Dec;76(4):1087-1097. doi: 10.1111/biom.13209. Epub 2020 Jan 2.

On the use of multiple imputation to address data missing by design as well as unintended missing data in case-cohort studies with a binary endpoint.

BMC Med Res Methodol. 2023 Dec 7;23(1):287. doi: 10.1186/s12874-023-02090-5.

Comparison between inverse-probability weighting and multiple imputation in Cox model with missing failure subtype.

Stat Methods Med Res. 2024 Feb;33(2):344-356. doi: 10.1177/09622802231226328. Epub 2024 Jan 23.

Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random.

Stat Methods Med Res. 2018 Feb;27(2):352-363. doi: 10.1177/0962280216628902. Epub 2016 Mar 16.

Statistical methods for incomplete data: Some results on model misspecification.

Stat Methods Med Res. 2017 Feb;26(1):248-267. doi: 10.1177/0962280214544251. Epub 2016 Jul 11.

Weight calibration to improve efficiency for estimating pure risks from the additive hazards model with the nested case-control design.

Biometrics. 2022 Mar;78(1):179-191. doi: 10.1111/biom.13413. Epub 2020 Dec 18.

引用本文的文献

Nested case-control sampling without replacement.

Lifetime Data Anal. 2024 Oct;30(4):776-799. doi: 10.1007/s10985-024-09633-y. Epub 2024 Sep 5.

本文引用的文献

Weight calibration to improve the efficiency of pure risk estimates from case-control samples nested in a cohort.

Biometrics. 2020 Dec;76(4):1087-1097. doi: 10.1111/biom.13209. Epub 2020 Jan 2.

Multiple imputation of missing data in nested case-control and case-cohort studies.

Biometrics. 2018 Dec;74(4):1438-1449. doi: 10.1111/biom.12910. Epub 2018 Jun 5.

A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data.

Stat Methods Med Res. 2018 Aug;27(8):2264-2278. doi: 10.1177/0962280216680239. Epub 2016 Nov 28.

Assessing the goodness of fit of personal risk models.

Stat Med. 2014 Aug 15;33(18):3179-90. doi: 10.1002/sim.6176. Epub 2014 Apr 22.

Two-stage sampling designs for external validation of personal risk models.

Stat Methods Med Res. 2016 Aug;25(4):1313-29. doi: 10.1177/0962280213480420. Epub 2013 Apr 16.

Absolute risk prediction of second primary thyroid cancer among 5-year survivors of childhood cancer.

J Clin Oncol. 2013 Jan 1;31(1):119-27. doi: 10.1200/JCO.2012.41.8996. Epub 2012 Nov 19.

Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods.

BMC Med Res Methodol. 2012 Apr 10;12:46. doi: 10.1186/1471-2288-12-46.

Risk prediction measures for case-cohort and nested case-control designs: an application to cardiovascular disease.

Am J Epidemiol. 2012 Apr 1;175(7):715-24. doi: 10.1093/aje/kwr374. Epub 2012 Mar 6.

Imputing missing covariate values for the Cox model.

Stat Med. 2009 Jul 10;28(15):1982-98. doi: 10.1002/sim.3618.

Nested case-control and case-cohort methods of sampling from a cohort: a critical comparison.

Am J Epidemiol. 1990 Jan;131(1):169-76. doi: 10.1093/oxfordjournals.aje.a115471.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估缺失协变量的风险模型校准。

Assessing risk model calibration with missing covariates.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献