Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, 80309, CO, USA.
Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, 80309, CO, USA.
BMC Bioinformatics. 2019 Jul 30;20(1):411. doi: 10.1186/s12859-019-2978-z.
Linear mixed-effects models (LMM) are a leading method in conducting genome-wide association studies (GWAS) but require residual maximum likelihood (REML) estimation of variance components, which is computationally demanding. Previous work has reduced the computational burden of variance component estimation by replacing direct matrix operations with iterative and stochastic methods and by employing loose tolerances to limit the number of iterations in the REML optimization procedure. Here, we introduce two novel algorithms, stochastic Lanczos derivative-free REML (SLDF_REML) and Lanczos first-order Monte Carlo REML (L_FOMC_REML), that exploit problem structure via the principle of Krylov subspace shift-invariance to speed computation beyond existing methods. Both novel algorithms only require a single round of computation involving iterative matrix operations, after which their respective objectives can be repeatedly evaluated using vector operations. Further, in contrast to existing stochastic methods, SLDF_REML can exploit precomputed genomic relatedness matrices (GRMs), when available, to further speed computation.
Results of numerical experiments are congruent with theory and demonstrate that interpreted-language implementations of both algorithms match or exceed existing compiled-language software packages in speed, accuracy, and flexibility.
Both the SLDF_REML and L_FOMC_REML algorithms outperform existing methods for REML estimation of variance components for LMM and are suitable for incorporation into existing GWAS LMM software implementations.
线性混合效应模型(LMM)是进行全基因组关联研究(GWAS)的主要方法,但需要对方差分量进行残差最大似然(REML)估计,这在计算上要求很高。先前的工作通过用迭代和随机方法替代直接矩阵运算,并通过放宽容限来限制 REML 优化过程中的迭代次数,从而降低了方差分量估计的计算负担。在这里,我们引入了两种新的算法,即随机 Lanczos 无导数 REML(SLDF_REML)和 Lanczos 一阶蒙特卡罗 REML(L_FOMC_REML),它们通过 Krylov 子空间平移不变性原理利用问题结构来加速计算,超越了现有方法。这两种新算法都只需要一轮涉及迭代矩阵运算的计算,之后可以使用向量运算重复评估各自的目标。此外,与现有随机方法相比,当可用时,SLDF_REML 可以利用预先计算的基因组亲缘关系矩阵(GRM)进一步加速计算。
数值实验的结果与理论相符,表明这两种算法的解释型语言实现与速度、准确性和灵活性方面都超越了现有的编译型语言软件包。
SLDF_REML 和 L_FOMC_REML 算法都优于现有用于 LMM 的 REML 方差分量估计方法,适合纳入现有的 GWAS LMM 软件实现中。