Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA.
Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA.
Genetics. 2022 Apr 4;220(4). doi: 10.1093/genetics/iyac015.
Single nucleotide polymorphism heritability of a trait is measured as the proportion of total variance explained by the additive effects of genome-wide single nucleotide polymorphisms. Linear mixed models are routinely used to estimate single nucleotide polymorphism heritability for many complex traits, which requires estimation of a genetic relationship matrix among individuals. Heritability is usually estimated by the restricted maximum likelihood or method of moments approaches such as Haseman-Elston regression. The common practice of accounting for such population substructure is to adjust for the top few principal components of the genetic relationship matrix as covariates in the linear mixed model. This can get computationally very intensive on large biobank-scale datasets. Here, we propose a method of moments approach for estimating single nucleotide polymorphism heritability in presence of population substructure. Our proposed method is computationally scalable on biobank datasets and gives an asymptotically unbiased estimate of heritability in presence of discrete substructures. It introduces the adjustments for population stratification in a second-order estimating equation. It allows these substructures to vary in their single nucleotide polymorphism allele frequencies and in their trait distributions (means and variances) while the heritability is assumed to be the same across these substructures. Through extensive simulation studies and the application on 7 quantitative traits in the UK Biobank cohort, we demonstrate that our proposed method performs well in the presence of population substructure and much more computationally efficient than existing approaches.
单核苷酸多态性遗传力是指全基因组单核苷酸多态性的加性效应解释的总方差的比例。线性混合模型通常用于估计许多复杂性状的单核苷酸多态性遗传力,这需要估计个体之间的遗传关系矩阵。遗传力通常通过受限最大似然或矩估计方法(如 Haseman-Elston 回归)来估计。通常的做法是将遗传关系矩阵的前几个主成分作为线性混合模型中的协变量来调整人口亚结构。在大型生物库规模的数据集中,这在计算上非常繁琐。在这里,我们提出了一种在存在群体亚结构的情况下估计单核苷酸多态性遗传力的矩估计方法。我们提出的方法在生物库数据集上具有计算可扩展性,并在存在离散亚结构的情况下提供遗传力的渐近无偏估计。它在二阶估计方程中引入了对群体分层的调整。它允许这些亚结构在单核苷酸多态性等位基因频率和性状分布(均值和方差)方面有所不同,而遗传力在这些亚结构中是相同的。通过广泛的模拟研究和在英国生物库队列中的 7 个定量性状上的应用,我们证明了我们提出的方法在存在群体亚结构时表现良好,并且比现有方法在计算上效率更高。