Wang Yuanjia, Yang Qiong, Rabinowitz Daniel
Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York 10032, USA.
Biometrics. 2011 Jun;67(2):331-43. doi: 10.1111/j.1541-0420.2010.01454.x. Epub 2010 Jun 16.
Population admixture can be a confounding factor in genetic association studies. Family-based methods (Rabinowitz and Larid, 2000, Human Heredity 50, 211-223) have been proposed in both testing and estimation settings to adjust for this confounding, especially in case-only association studies. The family-based methods rely on conditioning on the observed parental genotypes or on the minimal sufficient statistic for the genetic model under the null hypothesis. In some cases, these methods do not capture all the available information due to the conditioning strategy being too stringent. General efficient methods to adjust for population admixture that use all the available information have been proposed (Rabinowitz, 2002, Journal of the American Statistical Association 92, 742-758). However these approaches may not be easy to implement in some situations. A previously developed easy-to-compute approach adjusts for admixture by adding supplemental covariates to linear models (Yang et al., 2000, Human Heredity 50, 227-233). Here is shown that this augmenting linear model with appropriate covariates strategy can be combined with the general efficient methods in Rabinowitz (2002) to provide computationally tractable and locally efficient adjustment. After deriving the optimal covariates, the adjusted analysis can be carried out using standard statistical software packages such as SAS or R. The proposed methods enjoy a local efficiency in a neighborhood of the true model. The simulation studies show that nontrivial efficiency gains can be obtained by using information not accessible to the methods that rely on conditioning on the minimal sufficient statistics. The approaches are illustrated through an analysis of the influence of apolipoprotein E (APOE) genotype on plasma low-density lipoprotein (LDL) concentration in children.
群体混合可能是基因关联研究中的一个混杂因素。在检验和估计设置中都已提出基于家庭的方法(Rabinowitz和Larid,2000年,《人类遗传学》50卷,211 - 223页)来调整这种混杂情况,特别是在病例对照关联研究中。基于家庭的方法依赖于以观察到的亲本基因型为条件,或依赖于原假设下遗传模型的最小充分统计量。在某些情况下,由于条件设定策略过于严格,这些方法无法获取所有可用信息。已经提出了使用所有可用信息来调整群体混合的通用有效方法(Rabinowitz,2002年,《美国统计协会杂志》92卷,742 - 758页)。然而,这些方法在某些情况下可能不容易实施。一种先前开发的易于计算的方法是通过向线性模型中添加补充协变量来调整混合情况(Yang等人,2000年,《人类遗传学》50卷,227 - 233页)。这里表明,这种使用适当协变量的增强线性模型策略可以与Rabinowitz(2002年)中的通用有效方法相结合,以提供计算上易于处理且局部有效的调整。在推导最优协变量之后,可以使用标准统计软件包(如SAS或R)进行调整后的分析。所提出的方法在真实模型的邻域内具有局部效率。模拟研究表明,通过使用依赖于最小充分统计量条件的方法无法获取的信息,可以获得显著的效率提升。通过分析载脂蛋白E(APOE)基因型对儿童血浆低密度脂蛋白(LDL)浓度的影响来说明这些方法。