Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts 02118, USA.
Genet Epidemiol. 2011 Nov;35(7):592-6. doi: 10.1002/gepi.20607. Epub 2011 Jul 18.
Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this article, we examine the performance of this two-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two-stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two -stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided.
关联研究风险因素和复杂疾病需要仔细评估潜在的混杂因素。两阶段回归分析,有时也称为残差或调整后结果分析,已越来越多地用于单核苷酸多态性(SNP)和定量性状的关联研究。在这种分析中,首先,从协变量对结果变量的回归中计算出残差,然后通过对调整后结果与 SNP 的简单线性回归来评估调整后结果与 SNP 之间的关系。在本文中,我们将比较两阶段分析与多元线性回归(MLR)分析的性能。我们的研究结果表明,当 SNP 和协变量相关时,两阶段方法会导致偏置基因型效应和丧失功效。偏差总是朝着零值,并且随着 SNP 和协变量之间的平方相关系数()增加而增加。例如,对于,0.1 和 0.5,两阶段分析分别导致 SNP 效应衰减 0、10 和 50%。如预期的那样,MLR 始终是无偏的。由于单个 SNP 通常与协变量相关性较小或没有相关性,因此两阶段分析在许多遗传研究中应与 MLR 一样有效;然而,当自变量高度相关时,它会产生与 MLR 非常不同的结果,并可能导致错误的结论。虽然在 下是 MLR 的一种有用替代方法,但两阶段方法存在严重的局限性。应避免将其作为 MLR 的简单替代品使用。