Delcoigne Bénédicte, Støer Nathalie C, Reilly Marie
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
National Advisory Unit for Women's Health, Oslo University Hospital, Oslo, Norway.
Int J Epidemiol. 2018 Jun 1;47(3):841-849. doi: 10.1093/ije/dyx282.
It is not uncommon for investigators to conduct further analyses of subgroups, using data collected in a nested case-control design. Since the sampling of the participants is related to the outcome of interest, the data at hand are not a representative sample of the population, and subgroup analyses need to be carefully considered for their validity and interpretation.
We performed simulation studies, generating cohorts within the proportional hazards model framework and with covariate coefficients chosen to mimic realistic data and more extreme situations. From the cohorts we sampled nested case-control data and analysed the effect of a binary exposure on a time-to-event outcome in subgroups defined by a covariate (an independent risk factor, a confounder or an effect modifier) and compared the estimates with the corresponding subcohort estimates. Cohort analyses were performed with Cox regression, and nested case-control samples or restricted subsamples were analysed with both conditional logistic regression and weighted Cox regression.
For all studied scenarios, the subgroup analyses provided unbiased estimates of the exposure coefficients, with conditional logistic regression being less efficient than the weighted Cox regression.
For the study of a subpopulation, analysis of the corresponding subgroup of individuals sampled in a nested case-control design provides an unbiased estimate of the effect of exposure, regardless of whether the variable used to define the subgroup is a confounder, effect modifier or independent risk factor. Weighted Cox regression provides more efficient estimates than conditional logistic regression.
研究人员使用巢式病例对照设计收集的数据对亚组进行进一步分析的情况并不少见。由于参与者的抽样与感兴趣的结局相关,手头的数据并非总体的代表性样本,因此需要仔细考虑亚组分析的有效性和解释。
我们进行了模拟研究,在比例风险模型框架内生成队列,并选择协变量系数以模拟现实数据和更极端的情况。我们从队列中抽取巢式病例对照数据,并分析二元暴露对由协变量(独立危险因素、混杂因素或效应修饰因素)定义的亚组中事件发生时间结局的影响,并将估计值与相应亚队列估计值进行比较。队列分析采用Cox回归进行,巢式病例对照样本或受限子样本采用条件逻辑回归和加权Cox回归进行分析。
对于所有研究的情景,亚组分析提供了暴露系数的无偏估计,条件逻辑回归的效率低于加权Cox回归。
对于亚人群的研究,在巢式病例对照设计中对相应个体亚组进行分析可提供暴露效应的无偏估计,无论用于定义亚组的变量是混杂因素、效应修饰因素还是独立危险因素。加权Cox回归比条件逻辑回归提供更有效的估计。