Health Informatics Institute, Department of Pediatrics, University of South Florida Morsani College of Medicine, Tampa, FL, USA.
Diabetes Metab Res Rev. 2020 Jan;36(1):e3204. doi: 10.1002/dmrr.3204. Epub 2019 Jul 31.
A nested case-control (NCC) design within a prospective cohort study can realize substantial benefits for biomarker studies. In this context, it is natural to consider the sample availability in the selection of controls to minimize data loss when implementing the design. However, this violates the randomness required for selection, and it leads to biased analyses. An inverse probability weighting may improve the analysis, but the current approach using weighted Cox regression fails to maintain the benefits of NCC design.
This paper introduces weighted conditional logistic regression. We illustrate our proposed analysis using data recently investigated in The Environmental Determinants of Diabetes in the Young (TEDDY). Considering the potential data loss, the TEDDY NCC design was moderately selective in its selection of controls. A data-driven simulation study was performed to present the bias correction when a nonrandom control selection was ignored in the analysis.
The TEDDY data analysis showed that the standard analysis using conditional logistic regression estimated the parameter: -0.015 (-0.023, -0.007). The biased estimate using Cox regression was -0.011 (95% confidence interval: -0.019, -0.003). Weighted Cox regression estimated -0.013 (-0.026, 0.0004). The proposed weighted conditional logistic regression estimated -0.020 (-0.033, -0.007), showing a stronger negative effect size than the one using conditional logistic regression. The simulation study also showed that the standard estimate of β ignoring the nonrandom control selection tends to be greater than the true β (ie, positive relative biases).
Weighted conditional logistic regression can enhance the analysis by offering flexibility in the selection of controls, while maintaining the matching.
嵌套病例对照(NCC)设计在前瞻性队列研究中可以为生物标志物研究带来实质性的益处。在这种情况下,考虑到在实施设计时控制组选择中的样本可用性,可以最大程度地减少数据丢失。然而,这违反了选择所需的随机性,并且会导致分析结果有偏。逆概率加权可能会改善分析,但当前使用加权Cox 回归的方法无法保持 NCC 设计的优势。
本文介绍了加权条件逻辑回归。我们使用最近在儿童糖尿病环境决定因素研究(TEDDY)中进行的数据分析来说明我们提出的分析方法。考虑到潜在的数据丢失,TEDDY 的 NCC 设计在选择对照时是适度有选择性的。进行了一项数据驱动的模拟研究,以展示在分析中忽略非随机对照选择时的偏倚校正。
TEDDY 数据分析表明,使用条件逻辑回归进行标准分析估计的参数为:-0.015(-0.023,-0.007)。使用 Cox 回归进行有偏估计为-0.011(95%置信区间:-0.019,-0.003)。加权 Cox 回归估计为-0.013(-0.026,0.0004)。所提出的加权条件逻辑回归估计为-0.020(-0.033,-0.007),显示出比使用条件逻辑回归更强的负效应大小。模拟研究还表明,忽略非随机对照选择的标准β估计往往大于真实β(即,正相对偏倚)。
加权条件逻辑回归通过提供控制组选择的灵活性,同时保持匹配,从而增强分析。