Suppr超能文献

从全基因组关联研究(GWAS)汇总统计数据估计效应大小和预期重复概率。

Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics.

作者信息

Holland Dominic, Wang Yunpeng, Thompson Wesley K, Schork Andrew, Chen Chi-Hua, Lo Min-Tzu, Witoelar Aree, Werge Thomas, O'Donovan Michael, Andreassen Ole A, Dale Anders M

机构信息

Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Neurosciences, University of CaliforniaSan Diego, La Jolla, CA, USA.

Multimodal Imaging Laboratory, University of CaliforniaSan Diego, La Jolla, CA, USA; Department of Neurosciences, University of CaliforniaSan Diego, La Jolla, CA, USA; NORMENT, KG Jebsen Centre for Psychosis Research, Institute of Clinical Medicine, University of OsloOslo, Norway; Division of Mental Health and Addiction, Oslo University HospitalOslo, Norway.

出版信息

Front Genet. 2016 Feb 16;7:15. doi: 10.3389/fgene.2016.00015. eCollection 2016.

Abstract

Genome-wide Association Studies (GWAS) result in millions of summary statistics ("z-scores") for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 10(6) and 10(5). The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.

摘要

全基因组关联研究(GWAS)产生了数百万个关于单核苷酸多态性(SNP)与表型关联的汇总统计数据(“z分数”)。这些丰富的数据集为深入了解复杂表型(如精神疾病)的遗传贡献的性质和程度提供了深刻见解,据了解,这些复杂表型具有大量SNP产生的实质性遗传成分。然而,数据集的复杂性对最大化其效用构成了重大挑战。这体现在需要更好地理解z分数的格局,因为这样的知识将增强因果SNP和基因的发现,有助于阐明机制途径,并为未来的研究设计提供信息。在此,我们提出了一种简约的方法来对效应大小和复制概率进行建模,仅依赖于GWAS子研究的汇总统计数据,以及一种允许直接实证验证的方案。我们表明,将z分数建模为高斯混合在概念上是合适的,特别是考虑到由于与因果SNP的弱连锁不平衡,数据集中可能存在普遍的非零效应。四参数模型允许估计表型的多基因性程度,并预测在未来更大样本量的研究中,全基因组显著SNP可解释的芯片遗传力比例。我们将该模型应用于最近的精神分裂症GWAS(N = 82315)和壳核体积GWAS(N = 12596),两种情况下均有大约930万个SNP z分数。我们表明,在广泛的z分数和样本量范围内,该模型准确预测了多阶段GWAS设计中真实效应大小和复制概率的期望估计值。我们评估了基于线性回归关联系数时效应大小被高估的程度。我们估计精神分裂症的多基因性为0.037,壳核的多基因性为0.001,而要接近完全解释芯片遗传力所需的各自样本量分别为10⁶和10⁵。该模型可以扩展以纳入诸如多效性和SNP注释等先验知识。当前的研究结果表明,该模型适用于广泛的复杂表型,并将增强对其遗传结构的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/688c/4754432/bc949c5ac622/fgene-07-00015-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验