Cai Tianxi, Zheng Yingye
Department of Biostatistics, Harvarfad University, Boston, MA, USA.
Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
J Am Stat Assoc. 2013 Jan 1;108(504):1532-1544. doi: 10.1080/01621459.2013.856715.
The nested case-control (NCC) design have been widely adopted as a cost-effective solution in many large cohort studies for risk assessment with expensive markers, such as the emerging biologic and genetic markers. To analyze data from NCC studies, conditional logistic regression (Goldstein and Langholz, 1992; Borgan et al., 1995) and maximum likelihood (Scheike and Juul, 2004; Zeng et al., 2006) based methods have been proposed. However, most of these methods either cannot be easily extended beyond the Cox model (Cox, 1972) or require additional modeling assumptions. More generally applicable approaches based on inverse probability weighting (IPW) have been proposed as useful alternatives (Samuelsen, 1997; Chen, 2001; Samuelsen et al., 2007). However, due to the complex correlation structure induced by repeated finite risk set sampling, interval estimation for such IPW estimators remain challenging especially when the estimation involves non-smooth objective functions or when making simultaneous inferences about functions. Standard resampling procedures such as the bootstrap cannot accommodate the correlation and thus are not directly applicable. In this paper, we propose a resampling procedure that can provide valid estimates for the distribution of a broad class of IPW estimators. Simulation results suggest that the proposed procedures perform well in settings when analytical variance estimator is infeasible to derive or gives less optimal performance. The new procedures are illustrated with data from the Framingham Offspring Study to characterize individual level cardiovascular risks over time based on the Framingham risk score, C-reactive protein (CRP) and a genetic risk score.
巢式病例对照(NCC)设计已在许多大型队列研究中被广泛采用,作为一种经济高效的解决方案,用于使用昂贵标志物(如新兴的生物标志物和遗传标志物)进行风险评估。为了分析来自NCC研究的数据,已经提出了基于条件逻辑回归(Goldstein和Langholz,1992年;Borgan等人,1995年)和最大似然法(Scheike和Juul,2004年;Zeng等人,2006年)的方法。然而,这些方法中的大多数要么不容易扩展到Cox模型(Cox,1972年)之外,要么需要额外的建模假设。基于逆概率加权(IPW)的更普遍适用的方法已被提出作为有用的替代方法(Samuelsen,1997年;Chen,2001年;Samuelsen等人,2007年)。然而,由于重复有限风险集抽样引起的复杂相关结构,此类IPW估计量的区间估计仍然具有挑战性,特别是当估计涉及非光滑目标函数或对函数进行同时推断时。诸如自助法等标准重抽样程序无法适应这种相关性,因此不能直接应用。在本文中,我们提出了一种重抽样程序,该程序可以为一大类IPW估计量的分布提供有效估计。模拟结果表明,在分析方差估计量难以推导或性能欠佳的情况下,所提出的程序表现良好。通过弗雷明汉后代研究的数据说明了新程序,以基于弗雷明汉风险评分、C反应蛋白(CRP)和遗传风险评分来表征个体随时间的心血管风险水平。