Schwartzman Armin, Schork Andrew J, Zablocki Rong, Thompson Wesley K
University of California, San Diego, La Jolla, CA and.
Institute of Biological Psychiatry, Mental Health Center St. Hans, Mental Health Services Copenhagen, Roskilde, Denmark.
Ann Appl Stat. 2019 Dec;13(4):2509-2538. doi: 10.1214/19-aoas1291. Epub 2019 Nov 28.
Analysis of genome-wide association studies (GWAS) is characterized by a large number of univariate regressions where a quantitative trait is regressed on hundreds of thousands to millions of single-nucleotide polymorphism (SNP) allele counts, one at a time. This article proposes an estimator of the SNP heritability of the trait, defined here as the fraction of the variance of the trait explained by the SNPs in the study. The proposed GWAS heritability (GWASH) estimator is easy to compute, highly interpretable, and is consistent as the number of SNPs and the sample size increase. More importantly, it can be computed from summary statistics typically reported in GWAS, not requiring access to the original data. The estimator takes full account of the linkage disequilibrium (LD) or correlation between the SNPs in the study through moments of the LD matrix, estimable from auxiliary datasets. Unlike other proposed estimators in the literature, we establish the theoretical properties of the GWASH estimator and obtain analytical estimates of the precision, allowing for power and sample size calculations for SNP heritability estimates, and forming a firm foundation for future methodological development.
全基因组关联研究(GWAS)分析的特点是进行大量单变量回归,其中将一个数量性状依次对数十万至数百万个单核苷酸多态性(SNP)等位基因计数进行回归。本文提出了一种性状的SNP遗传力估计方法,这里将其定义为研究中SNP所解释的性状方差的比例。所提出的GWAS遗传力(GWASH)估计器易于计算、具有高度可解释性,并且随着SNP数量和样本量的增加是一致的。更重要的是,它可以从GWAS通常报告的汇总统计数据中计算得出,无需访问原始数据。该估计器通过LD矩阵的矩充分考虑了研究中SNP之间的连锁不平衡(LD)或相关性,可从辅助数据集中估计得到。与文献中其他提出的估计器不同,我们建立了GWASH估计器的理论性质,并获得了精度的分析估计值,从而能够进行SNP遗传力估计的功效和样本量计算,并为未来的方法学发展奠定坚实基础。