Jiang Duo, Zhong Sheng, McPeek Mary Sara
Department of Statistics, Oregon State University, Corvallis, OR 97331, USA.
Department of Statistics, University of Chicago, Chicago, IL 60637, USA.
Am J Hum Genet. 2016 Feb 4;98(2):243-55. doi: 10.1016/j.ajhg.2015.12.012. Epub 2016 Jan 28.
In genetic association testing, failure to properly control for population structure can lead to severely inflated type 1 error and power loss. Meanwhile, adjustment for relevant covariates is often desirable and sometimes necessary to protect against spurious association and to improve power. Many recent methods to account for population structure and covariates are based on linear mixed models (LMMs), which are primarily designed for quantitative traits. For binary traits, however, LMM is a misspecified model and can lead to deteriorated performance. We propose CARAT, a binary-trait association testing approach based on a mixed-effects quasi-likelihood framework, which exploits the dichotomous nature of the trait and achieves computational efficiency through estimating equations. We show in simulation studies that CARAT consistently outperforms existing methods and maintains high power in a wide range of population structure settings and trait models. Furthermore, CARAT is based on a retrospective approach, which is robust to misspecification of the phenotype model. We apply our approach to a genome-wide analysis of Crohn disease, in which we replicate association with 17 previously identified regions. Moreover, our analysis on 5p13.1, an extensively reported region of association, shows evidence for the presence of multiple independent association signals in the region. This example shows how CARAT can leverage known disease risk factors to shed light on the genetic architecture of complex traits.
在基因关联测试中,未能妥善控制群体结构会导致I型错误严重膨胀和效能损失。同时,对相关协变量进行调整通常是可取的,有时也是必要的,以防止出现虚假关联并提高效能。最近许多考虑群体结构和协变量的方法都基于线性混合模型(LMMs),这些模型主要是为定量性状设计的。然而,对于二元性状,LMM是一个错误设定的模型,可能导致性能下降。我们提出了CARAT,一种基于混合效应拟似然框架的二元性状关联测试方法,该方法利用了性状的二分性质,并通过估计方程实现了计算效率。我们在模拟研究中表明,CARAT始终优于现有方法,并在广泛的群体结构设置和性状模型中保持高功效。此外,CARAT基于一种回顾性方法,对表型模型的错误设定具有鲁棒性。我们将我们的方法应用于克罗恩病的全基因组分析,在该分析中我们复制了与17个先前确定区域的关联。此外,我们对5p13.1(一个广泛报道的关联区域)的分析表明,该区域存在多个独立的关联信号。这个例子展示了CARAT如何利用已知的疾病风险因素来揭示复杂性状的遗传结构。