超越大规模关联推断有罪：基于全基因组汇总统计寻找因果变异体。

Beyond guilty by association at scale: searching for causal variants on the basis of genome-wide summary statistics.

作者信息

He Zihuai, Chu Benjamin, Yang James, Gu Jiaqi, Chen Zhaomeng, Liu Linxi, Morrison Tim, Belloy Michael E, Qi Xinran, Hejazi Nima, Mathur Maya, Le Guen Yann, Tang Hua, Hastie Trevor, Ionita-Laza Iuliana, Candès Emmanuel, Sabatti Chiara

机构信息

Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA.

Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.

出版信息

bioRxiv. 2025 Feb 26:2024.02.28.582621. doi: 10.1101/2024.02.28.582621.

DOI:10.1101/2024.02.28.582621

PMID:38464202

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10925326/

Abstract

Understanding the causal genetic architecture of complex phenotypes will fuel future research into disease mechanisms and potential therapies. Here, we illustrate the power of a novel framework: it detects, starting from summary statistics, and across the entire genome, sets of variants that carry non-redundant information on the phenotypes and are therefore more likely to be causal in a biological sense. The approach, implemented in open-source software, is also computationally efficient, requiring less than 15 minutes on a single CPU to perform genome-wide analysis. Through extensive genome-wide simulation studies, we show that the method can substantially outperform existing methods in false discovery rate control, statistical power and various fine-mapping criteria. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer's disease (AD), we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline. Massively parallel reporter assays and CRISPR-Cas9 experiments have confirmed the functionality of the putative causal variants our method points to. Finally, we retrospectively analyzed summary statistics from 67 large-scale GWAS for a variety of phenotypes. Results reveal the method's capacity to robustly discover additional loci for polygenic traits and pinpoint potential causal variants underpinning each locus beyond conventional GWAS pipeline, contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses.

摘要

了解复杂性状的因果遗传结构将推动未来对疾病机制和潜在治疗方法的研究。在此，我们展示了一个新框架的强大功能：它从汇总统计数据开始，在全基因组范围内检测携带关于表型的非冗余信息的变异集，因此从生物学意义上讲更有可能是因果性的。该方法通过开源软件实现，计算效率也很高，在单个CPU上进行全基因组分析只需不到15分钟。通过广泛的全基因组模拟研究，我们表明该方法在错误发现率控制、统计功效和各种精细定位标准方面可以显著优于现有方法。在对十项阿尔茨海默病（AD）大规模遗传研究的荟萃分析应用中，我们确定了82个与AD相关的基因座，包括传统全基因组关联研究（GWAS）流程遗漏的37个额外基因座。大规模平行报告基因检测和CRISPR-Cas9实验证实了我们方法所指向的假定因果变异的功能。最后，我们回顾性分析了67项针对各种表型的大规模GWAS的汇总统计数据。结果揭示了该方法能够稳健地发现多基因性状的额外基因座，并在传统GWAS流程之外精确确定每个基因座的潜在因果变异，有助于在后GWAS分析中更深入地理解复杂的遗传结构。