Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA.
Nat Genet. 2024 Nov;56(11):2361-2369. doi: 10.1038/s41588-024-01934-0. Epub 2024 Sep 30.
Machine learning (ML) has become increasingly popular in almost all scientific disciplines, including human genetics. Owing to challenges related to sample collection and precise phenotyping, ML-assisted genome-wide association study (GWAS), which uses sophisticated ML techniques to impute phenotypes and then performs GWAS on the imputed outcomes, have become increasingly common in complex trait genetics research. However, the validity of ML-assisted GWAS associations has not been carefully evaluated. Here, we report pervasive risks for false-positive associations in ML-assisted GWAS and introduce Post-Prediction GWAS (POP-GWAS), a statistical framework that redesigns GWAS on ML-imputed outcomes. POP-GWAS ensures valid and powerful statistical inference irrespective of imputation quality and choice of algorithm, requiring only GWAS summary statistics as input. We employed POP-GWAS to perform a GWAS of bone mineral density derived from dual-energy X-ray absorptiometry imaging at 14 skeletal sites, identifying 89 new loci and revealing skeletal site-specific genetic architecture. Our framework offers a robust analytic solution for future ML-assisted GWAS.
机器学习 (ML) 在几乎所有科学领域都变得越来越流行,包括人类遗传学。由于与样本采集和精确表型相关的挑战,使用复杂 ML 技术进行表型推断,然后对推断结果进行全基因组关联研究 (GWAS) 的 ML 辅助 GWAS 越来越常见于复杂性状遗传学研究中。然而,ML 辅助 GWAS 关联的有效性尚未得到仔细评估。在这里,我们报告了 ML 辅助 GWAS 中普遍存在的假阳性关联风险,并介绍了 Post-Prediction GWAS (POP-GWAS),这是一种重新设计基于 ML 推断结果的 GWAS 的统计框架。POP-GWAS 确保了有效的和强大的统计推断,无论插补质量和算法选择如何,仅需 GWAS 汇总统计信息作为输入。我们使用 POP-GWAS 对来自 14 个骨骼部位的双能 X 射线吸收法成像的骨密度进行了 GWAS,鉴定出 89 个新的基因座,并揭示了骨骼部位特异性的遗传结构。我们的框架为未来的 ML 辅助 GWAS 提供了稳健的分析解决方案。