基于随机 Lanczos 估计的线性混合效应模型的基因组方差分量估计。

Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models.

机构信息

Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, 80309, CO, USA.

Department of Psychology and Neuroscience, University of Colorado Boulder, Boulder, 80309, CO, USA.

出版信息

BMC Bioinformatics. 2019 Jul 30;20(1):411. doi: 10.1186/s12859-019-2978-z.

DOI:10.1186/s12859-019-2978-z

PMID:31362713

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6668092/

Abstract

BACKGROUND

Linear mixed-effects models (LMM) are a leading method in conducting genome-wide association studies (GWAS) but require residual maximum likelihood (REML) estimation of variance components, which is computationally demanding. Previous work has reduced the computational burden of variance component estimation by replacing direct matrix operations with iterative and stochastic methods and by employing loose tolerances to limit the number of iterations in the REML optimization procedure. Here, we introduce two novel algorithms, stochastic Lanczos derivative-free REML (SLDF_REML) and Lanczos first-order Monte Carlo REML (L_FOMC_REML), that exploit problem structure via the principle of Krylov subspace shift-invariance to speed computation beyond existing methods. Both novel algorithms only require a single round of computation involving iterative matrix operations, after which their respective objectives can be repeatedly evaluated using vector operations. Further, in contrast to existing stochastic methods, SLDF_REML can exploit precomputed genomic relatedness matrices (GRMs), when available, to further speed computation.

RESULTS

Results of numerical experiments are congruent with theory and demonstrate that interpreted-language implementations of both algorithms match or exceed existing compiled-language software packages in speed, accuracy, and flexibility.

CONCLUSIONS

Both the SLDF_REML and L_FOMC_REML algorithms outperform existing methods for REML estimation of variance components for LMM and are suitable for incorporation into existing GWAS LMM software implementations.

摘要

背景

线性混合效应模型（LMM）是进行全基因组关联研究（GWAS）的主要方法，但需要对方差分量进行残差最大似然（REML）估计，这在计算上要求很高。先前的工作通过用迭代和随机方法替代直接矩阵运算，并通过放宽容限来限制 REML 优化过程中的迭代次数，从而降低了方差分量估计的计算负担。在这里，我们引入了两种新的算法，即随机 Lanczos 无导数 REML（SLDF_REML）和 Lanczos 一阶蒙特卡罗 REML（L_FOMC_REML），它们通过 Krylov 子空间平移不变性原理利用问题结构来加速计算，超越了现有方法。这两种新算法都只需要一轮涉及迭代矩阵运算的计算，之后可以使用向量运算重复评估各自的目标。此外，与现有随机方法相比，当可用时，SLDF_REML 可以利用预先计算的基因组亲缘关系矩阵（GRM）进一步加速计算。

结果

数值实验的结果与理论相符，表明这两种算法的解释型语言实现与速度、准确性和灵活性方面都超越了现有的编译型语言软件包。

结论

SLDF_REML 和 L_FOMC_REML 算法都优于现有用于 LMM 的 REML 方差分量估计方法，适合纳入现有的 GWAS LMM 软件实现中。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于随机 Lanczos 估计的线性混合效应模型的基因组方差分量估计。

Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

基于随机 Lanczos 估计的线性混合效应模型的基因组方差分量估计。

Stochastic Lanczos estimation of genomic variance components for linear mixed-effects models.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献