Breslow Norman E, Lumley Thomas, Ballantyne Christie M, Chambless Lloyd E, Kulich Michal
Department of Biostatistics, University of Washington, Seattle, WA, USA, Tel.: +1-206-543-2035.
Stat Biosci. 2009 May 1;1(1):32. doi: 10.1007/s12561-009-9001-6.
The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators.
第一阶段从无限超总体中进行简单随机抽样,第二阶段从有限队列中进行分层随机抽样。病例队列数据的标准分析涉及求解逆概率加权(IPW)估计方程,权重由已知的第二阶段抽样比例确定。(半)参数模型(包括Cox模型)中参数估计的方差是两项之和:(i)如果整个队列有完整数据时通常估计的基于模型的方差;(ii)来自有效影响函数(IF)贡献的未知队列总数的IPW估计的基于设计的方差。可以通过调整抽样权重来减少第二个方差分量,要么通过校准与IF贡献相关的辅助变量的已知队列总数,要么通过使用这些相同的辅助变量对其进行估计。这两种调整方法都在R调查包中实现。我们推导了使用调整权重估计的系数的极限定律。渐近结果提出了构建辅助变量的实用方法,这些方法通过对国家肾母细胞瘤研究的病例队列样本进行模拟以及对社区动脉粥样硬化风险研究的病例队列数据进行对数线性建模来评估。尽管不是半参数有效的,但基于调整权重的估计器可能在增强IPW估计器类中接近实现完全效率。