Computational Genomics, IBM T.J. Watson Research Center, Yorktown Heights, New York 10598, USA.
Computer Science Department, Purdue University, West Lafayette, Indiana 47907, USA.
Genome Res. 2024 Oct 11;34(9):1304-1311. doi: 10.1101/gr.279230.124.
Linear mixed models (LMMs) have been widely used in genome-wide association studies to control for population stratification and cryptic relatedness. However, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relationship matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveraging , which often results in provably accurate fast and efficient approximations. We leverage matrix sketching to develop a fast and efficient LMM method called trix-etching LMM (MaSk-LMM) by sketching the genotype matrix to reduce its dimensions and speed up computations. Our framework comes with both theoretical guarantees and a strong empirical performance compared to the current state-of-the-art for simulated traits and complex diseases.
线性混合模型 (LMM) 在全基因组关联研究中被广泛用于控制群体分层和隐匿相关。然而,估计 LMM 参数计算成本很高,需要大规模的矩阵操作来构建遗传关系矩阵 (GRM)。在过去的 25 年中,随机线性代数通过利用随机抽样来提供替代矩阵操作的方法,这通常可以得到可证明的准确、快速和高效的近似。我们利用矩阵草图来开发一种快速高效的 LMM 方法,称为 trix-etching LMM (MaSk-LMM),通过对基因型矩阵进行草图来降低其维度并加速计算。与模拟特征和复杂疾病的当前最先进方法相比,我们的框架具有理论保证和强大的经验性能。