Jiang Longda, Zheng Zhili, Fang Hailing, Yang Jian
Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia.
School of Life Sciences, Westlake University, Hangzhou, China.
Nat Genet. 2021 Nov;53(11):1616-1621. doi: 10.1038/s41588-021-00954-4. Epub 2021 Nov 4.
Compared with linear mixed model-based genome-wide association (GWA) methods, generalized linear mixed model (GLMM)-based methods have better statistical properties when applied to binary traits but are computationally much slower. In the present study, leveraging efficient sparse matrix-based algorithms, we developed a GLMM-based GWA tool, fastGWA-GLMM, that is severalfold to orders of magnitude faster than the state-of-the-art tools when applied to the UK Biobank (UKB) data and scalable to cohorts with millions of individuals. We show by simulation that the fastGWA-GLMM test statistics of both common and rare variants are well calibrated under the null, even for traits with extreme case-control ratios. We applied fastGWA-GLMM to the UKB data of 456,348 individuals, 11,842,647 variants and 2,989 binary traits (full summary statistics available at http://fastgwa.info/ukbimpbin ), and identified 259 rare variants associated with 75 traits, demonstrating the use of imputed genotype data in a large cohort to discover rare variants for binary complex traits.
与基于线性混合模型的全基因组关联(GWA)方法相比,基于广义线性混合模型(GLMM)的方法在应用于二元性状时具有更好的统计特性,但计算速度要慢得多。在本研究中,利用基于高效稀疏矩阵的算法,我们开发了一种基于GLMM的GWA工具fastGWA-GLMM,当应用于英国生物银行(UKB)数据时,它比最先进的工具快几倍到几个数量级,并且可扩展到数百万个体的队列。我们通过模拟表明,即使对于病例对照比极端的性状,常见和罕见变异的fastGWA-GLMM检验统计量在原假设下也能得到很好的校准。我们将fastGWA-GLMM应用于456348名个体、11842647个变异和2989个二元性状的UKB数据(完整汇总统计信息可在http://fastgwa.info/ukbimpbin获得),并鉴定出与75个性状相关的259个罕见变异,证明了在大型队列中使用推算基因型数据来发现二元复杂性状的罕见变异。