Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA.
Genet Epidemiol. 2011 Apr;35(3):190-200. doi: 10.1002/gepi.20568. Epub 2011 Feb 9.
Genetic association studies for binary diseases are designed as case-control studies: the cases are those affected with the primary disease and the controls are free of the disease. At the time of case-control collection, information about secondary phenotypes is also collected. Association studies of secondary phenotype and genetic variants have received a great deal of interest recently. To study the secondary phenotypes, investigators use standard regression approaches, where individuals with secondary phenotypes are coded as cases and those without secondary phenotypes are coded as controls. However, using the secondary phenotype as an outcome variable in a case-control study might lead to a biased estimate of odds ratios (ORs) for genetic variants. The secondary phenotype is associated with the primary disease; therefore, individuals with and without the secondary phenotype are not sampled following the principles of a case-control study. In this article, we demonstrate that such analyses will lead to a biased estimate of OR and propose new approaches to provide more accurate OR estimates of genetic variants associated with the secondary phenotype for both unmatched and frequency-matched (with respect to the secondary phenotype) case-control studies. We also propose a bootstrapping method to estimate the empirical confidence intervals for the corrected ORs. Using simulation studies and analysis of lung cancer data for single-nucleotide polymorphism associated with smoking quantity, we compared our new approaches to standard logistic regression and to an extended version of the inverse-probability-of-sampling-weighted regression. The proposed approaches provide more accurate estimation of the true OR.
对于二元疾病的遗传关联研究,通常采用病例对照研究设计:病例是患有主要疾病的患者,对照是没有该疾病的人。在收集病例对照时,也会收集次要表型的信息。最近,人们对次要表型与遗传变异的关联研究产生了浓厚的兴趣。为了研究次要表型,研究人员使用标准回归方法,将具有次要表型的个体编码为病例,将没有次要表型的个体编码为对照。然而,在病例对照研究中,将次要表型作为因变量使用可能会导致遗传变异的比值比(OR)的估计值存在偏差。次要表型与主要疾病相关联;因此,具有和不具有次要表型的个体并不是按照病例对照研究的原则进行抽样的。在本文中,我们证明了这种分析方法会导致 OR 的估计值存在偏差,并提出了新的方法,以提供与次要表型相关的遗传变异的更准确的 OR 估计值,适用于未匹配和频率匹配(相对于次要表型)的病例对照研究。我们还提出了一种自举方法来估计校正后的 OR 的经验置信区间。通过模拟研究和对与吸烟量相关的单核苷酸多态性的肺癌数据的分析,我们将新方法与标准逻辑回归和反向概率抽样加权回归的扩展版本进行了比较。所提出的方法提供了更准确的真实 OR 的估计值。