病例-对照研究的多重填补分析。

Multiple imputation analysis of case-cohort studies.

机构信息

Inserm, CESP Centre for Research in Epidemiology and Population Health, U1018, Biostatistics team, F-94807 Villejuif, France.

出版信息

Stat Med. 2011 Jun 15;30(13):1595-607. doi: 10.1002/sim.4130. Epub 2011 Feb 24.

DOI:10.1002/sim.4130

PMID:21351290

Abstract

The usual methods for analyzing case-cohort studies rely on sometimes not fully efficient weighted estimators. Multiple imputation might be a good alternative because it uses all the data available and approximates the maximum partial likelihood estimator. This method is based on the generation of several plausible complete data sets, taking into account uncertainty about missing values. When the imputation model is correctly defined, the multiple imputation estimator is asymptotically unbiased and its variance is correctly estimated. We show that a correct imputation model must be estimated from the fully observed data (cases and controls), using the case status among the explanatory variable. To validate the approach, we analyzed case-cohort studies first with completely simulated data and then with case-cohort data sampled from two real cohorts. The analyses of simulated data showed that, when the imputation model was correct, the multiple imputation estimator was unbiased and efficient. The observed gain in precision ranged from 8 to 37 per cent for phase-1 variables and from 5 to 19 per cent for the phase-2 variable. When the imputation model was misspecified, the multiple imputation estimator was still more efficient than the weighted estimators but it was also slightly biased. The analyses of case-cohort data sampled from complete cohorts showed that even when no strong predictor of the phase-2 variable was available, the multiple imputation was unbiased, as precised as the weighted estimator for the phase-2 variable and slightly more precise than the weighted estimators for the phase-1 variables. However, the multiple imputation estimator was found to be biased when, because of interaction terms, some coefficients of the imputation model had to be estimated from small samples. Multiple imputation is an efficient technique for analyzing case-cohort data. Practically, we suggest building the analysis model using only the case-cohort data and weighted estimators. Multiple imputation can eventually be used to reanalyze the data using the selected model in order to improve the precision of the results.

摘要

病例-对照研究的常用分析方法依赖于有时不完全有效的加权估计量。多重插补可能是一个很好的替代方法，因为它使用了所有可用的数据，并近似最大部分似然估计量。这种方法基于生成几个合理的完整数据集，考虑到对缺失值的不确定性。当插补模型正确定义时，多重插补估计量是渐近无偏的，其方差也得到正确估计。我们表明，正确的插补模型必须从完全观察到的数据（病例和对照）中，使用解释变量中的病例状态来估计。为了验证该方法，我们首先使用完全模拟数据，然后使用从两个真实队列中抽取的病例-对照数据来分析病例-对照研究。模拟数据分析表明，当插补模型正确时，多重插补估计量是无偏且有效的。第一阶段变量的精度增益范围为 8%至 37%，第二阶段变量的精度增益范围为 5%至 19%。当插补模型被错误指定时，多重插补估计量仍然比加权估计量更有效，但也有轻微的偏差。从完整队列中抽取的病例-对照数据的分析表明，即使没有第二阶段变量的强预测因子，多重插补也是无偏的，对于第二阶段变量的精度与加权估计量相当，对于第一阶段变量的精度略高于加权估计量。然而，当由于交互项，插补模型的一些系数必须从小样本中估计时，多重插补估计量会出现偏差。多重插补是分析病例-对照数据的有效技术。实际上，我们建议仅使用病例-对照数据和加权估计量构建分析模型。多重插补最终可以用于使用选定的模型重新分析数据，以提高结果的精度。