Zhong Sheng, Jiang Duo, McPeek Mary Sara
Department of Statistics, University of Chicago, Chicago, Illinois, United States of America.
Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America.
PLoS Genet. 2016 Oct 3;12(10):e1006329. doi: 10.1371/journal.pgen.1006329. eCollection 2016 Oct.
We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to allow samples with related individuals and to incorporate partially missing data. In simulations, we show that CERAMIC outperforms existing LMM and generalized LMM approaches, maintaining high power and correct type 1 error across a wider range of scenarios. CERAMIC results in a particularly large power increase over existing methods when the sample includes related individuals with some missing data (e.g., when some individuals with phenotype and covariate information have missing genotype), because CERAMIC is able to make use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Because CERAMIC is based on a retrospective analysis, it is robust to misspecification of the phenotype model, resulting in better control of type 1 error and higher power than that of prospective methods, such as GMMAT, when the phenotype model is misspecified. CERAMIC is computationally efficient for genomewide analysis in samples of related individuals of almost any configuration, including small families, unrelated individuals and even large, complex pedigrees. We apply CERAMIC to data on type 2 diabetes (T2D) from the Framingham Heart Study. In a genome scan, 9 of the 10 smallest CERAMIC p-values occur in or near either known T2D susceptibility loci or plausible candidates, verifying that CERAMIC is able to home in on the important loci in a genome scan.
我们考虑在包含亲属个体的样本中对二元性状进行基因关联测试的问题,其中我们对相关协变量进行调整并允许存在缺失数据。我们提出了CERAMIC,这是一种估计方程方法,可被视为逻辑回归和线性混合效应模型(LMM)方法的混合体。CERAMIC扩展了最近提出的CARAT方法,以允许包含亲属个体的样本并纳入部分缺失数据。在模拟中,我们表明CERAMIC优于现有的LMM和广义LMM方法,在更广泛的场景中保持高功效和正确的I型错误率。当样本包括有一些缺失数据的亲属个体时(例如,当一些具有表型和协变量信息的个体存在缺失基因型时),CERAMIC相对于现有方法在功效上有特别大的提升,因为CERAMIC能够利用关系信息在分析中纳入部分缺失数据,同时校正相关性。由于CERAMIC基于回顾性分析,当表型模型指定错误时,它对表型模型的错误指定具有鲁棒性,从而比前瞻性方法(如GMMAT)能更好地控制I型错误率并具有更高的功效。对于几乎任何配置的亲属个体样本(包括小家庭、无亲属关系的个体甚至大型复杂家系)的全基因组分析,CERAMIC在计算上都是高效的。我们将CERAMIC应用于弗雷明汉心脏研究中2型糖尿病(T2D)的数据。在全基因组扫描中,10个最小的CERAMIC p值中有9个出现在已知的T2D易感位点或合理候选位点内或附近,证实了CERAMIC能够在全基因组扫描中锁定重要位点。