Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
Nat Genet. 2018 Sep;50(9):1335-1341. doi: 10.1038/s41588-018-0184-y. Epub 2018 Aug 13.
In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.
在大型生物库中对数千种表型进行全基因组关联研究(GWAS)时,大多数二元性状的病例数明显少于对照数。两种广泛使用的方法,线性混合模型和最近提出的逻辑混合模型,表现不佳;当用于分析不平衡病例对照表型时,它们会产生较大的 I 型错误率。在这里,我们提出了一种可扩展且准确的广义混合模型关联测试方法,该方法使用鞍点逼近来校准评分检验统计量的分布。这种方法,SAIGE(可扩展和准确的广义混合模型实现),即使在病例对照比极不平衡的情况下,也能提供准确的 P 值。SAIGE 使用最先进的优化策略来降低计算成本;因此,它适用于通过大型生物库对数千种表型进行 GWAS。通过对 408961 名具有欧洲血统的白种英国参与者的 UK Biobank 数据进行分析,我们发现,SAIGE 可以有效地分析大型样本数据,控制不平衡的病例对照比和样本相关性。