Payne Rebecca, Neykov Matey, Jensen Majken Karoline, Cai Tianxi
Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts 02115, U.S.A.
Biometrics. 2016 Jun;72(2):372-81. doi: 10.1111/biom.12452. Epub 2015 Dec 21.
Large assembled cohorts with banked biospecimens offer valuable opportunities to identify novel markers for risk prediction. When the outcome of interest is rare, an effective strategy to conserve limited biological resources while maintaining reasonable statistical power is the case cohort (CCH) sampling design, in which expensive markers are measured on a subset of cases and controls. However, the CCH design introduces significant analytical complexity due to outcome-dependent, finite-population sampling. Current methods for analyzing CCH studies focus primarily on the estimation of simple survival models with linear effects; testing and estimation procedures that can efficiently capture complex non-linear marker effects for CCH data remain elusive. In this article, we propose inverse probability weighted (IPW) variance component type tests for identifying important marker sets through a Cox proportional hazards kernel machine (CoxKM) regression framework previously considered for full cohort studies (Cai et al., 2011). The optimal choice of kernel, while vitally important to attain high power, is typically unknown for a given dataset. Thus, we also develop robust testing procedures that adaptively combine information from multiple kernels. The proposed IPW test statistics have complex null distributions that cannot easily be approximated explicitly. Furthermore, due to the correlation induced by CCH sampling, standard resampling methods such as the bootstrap fail to approximate the distribution correctly. We, therefore, propose a novel perturbation resampling scheme that can effectively recover the induced correlation structure. Results from extensive simulation studies suggest that the proposed IPW CoxKM testing procedures work well in finite samples. The proposed methods are further illustrated by application to a Danish CCH study of Apolipoprotein C-III markers on the risk of coronary heart disease.
拥有生物样本库的大型队列集合为识别风险预测的新型标志物提供了宝贵机会。当感兴趣的结局罕见时,一种在保持合理统计功效的同时节约有限生物资源的有效策略是病例队列(CCH)抽样设计,即在病例和对照的一个子集中测量昂贵的标志物。然而,由于结局依赖的有限总体抽样,CCH设计引入了显著的分析复杂性。当前分析CCH研究的方法主要集中在具有线性效应的简单生存模型的估计上;能够有效捕捉CCH数据复杂非线性标志物效应的检验和估计程序仍然难以捉摸。在本文中,我们提出了逆概率加权(IPW)方差分量类型检验,通过先前用于全队列研究的Cox比例风险核机器(CoxKM)回归框架来识别重要的标志物集(Cai等人,2011年)。核的最佳选择虽然对于获得高功效至关重要,但对于给定数据集通常是未知的。因此,我们还开发了稳健的检验程序,可自适应地组合来自多个核的信息。所提出的IPW检验统计量具有复杂的零分布,难以轻易明确近似。此外,由于CCH抽样引起的相关性,诸如自助法等标准重抽样方法无法正确近似分布。因此,我们提出了一种新颖的扰动重抽样方案,它可以有效地恢复诱导的相关结构。广泛模拟研究的结果表明,所提出的IPW CoxKM检验程序在有限样本中效果良好。通过应用于丹麦关于载脂蛋白C-III标志物对冠心病风险的CCH研究,进一步说明了所提出的方法。