Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America.
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America.
PLoS Genet. 2022 Apr 20;18(4):e1010151. doi: 10.1371/journal.pgen.1010151. eCollection 2022 Apr.
With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model (LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and Gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large Gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.
随着高通量遗传数据的出现,人们试图使用线性混合模型(LMM)从远距离相关个体的全基因组 SNP 数据中估计遗传率。然而,由于其高度的线性代数运算,在大规模队列研究中拟合这样的 LMM 极具挑战性。在本文中,我们提出了一种新的方法,名为 PredLMM,它受到遗传合并和高斯预测过程概念的启发,用于逼近上述 LMM。PredLMM 的计算复杂度比大多数现有的基于 LMM 的方法有了显著的降低,因此为在大规模队列研究中估计遗传率提供了一种快速的替代方法。从理论上讲,我们表明,在遗传合并模型下,我们的逼近的极限形式是著名的大高斯过程似然的预测过程逼近,它具有成熟的准确性标准。我们通过广泛的模拟研究说明了我们的方法,并将其用于从英国生物库队列中估计多个数量性状的遗传率。