Division of Health Statistics, School of Public Health, Shanxi Medical University, No.56 Xin jian South Road, 030001 Shanxi, China.
Department of Biostatistics, University of Florida, 2004 Mowry Road, 32611 FL, USA.
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac115.
Heritability, the proportion of phenotypic variance explained by genome-wide single nucleotide polymorphisms (SNPs) in unrelated individuals, is an important measure of the genetic contribution to human diseases and plays a critical role in studying the genetic architecture of human diseases. Linear mixed model (LMM) has been widely used for SNP heritability estimation, where variance component parameters are commonly estimated by using a restricted maximum likelihood (REML) method. REML is an iterative optimization algorithm, which is computationally intensive when applied to large-scale datasets (e.g. UK Biobank). To facilitate the heritability analysis of large-scale genetic datasets, we develop a fast approach, minimum norm quadratic unbiased estimator (MINQUE) with batch training, to estimate variance components from LMM (LMM.MNQ.BCH). In LMM.MNQ.BCH, the parameters are estimated by MINQUE, which has a closed-form solution for fast computation and has no convergence issue. Batch training has also been adopted in LMM.MNQ.BCH to accelerate the computation for large-scale genetic datasets. Through simulations and real data analysis, we demonstrate that LMM.MNQ.BCH is much faster than two existing approaches, GCTA and BOLT-REML.
遗传力是指在无亲缘关系的个体中,由全基因组单核苷酸多态性(SNPs)解释的表型方差的比例,是衡量人类疾病遗传贡献的重要指标,在研究人类疾病的遗传结构中起着关键作用。线性混合模型(LMM)已广泛用于 SNP 遗传力估计,其中方差分量参数通常通过使用限制最大似然(REML)方法进行估计。REML 是一种迭代优化算法,当应用于大规模数据集(例如 UK Biobank)时,计算量很大。为了便于对大规模遗传数据集进行遗传力分析,我们开发了一种快速方法,即使用批量训练的最小范数二次无偏估计器(MINQUE),从 LMM 中估计方差分量(LMM.MNQ.BCH)。在 LMM.MNQ.BCH 中,参数通过 MINQUE 进行估计,MINQUE 具有快速计算的闭式解,不存在收敛问题。批量训练也已应用于 LMM.MNQ.BCH 中,以加速大规模遗传数据集的计算。通过模拟和真实数据分析,我们证明 LMM.MNQ.BCH 比现有的两种方法 GCTA 和 BOLT-REML 快得多。