Department of Animal Science, North Carolina State University, Raleigh, NC 27695, United States.
Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD 20705, United States.
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad127.
The amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging.
We present SLEMM (short for Stochastic-Lanczos-Expedited Mixed Models), a new software tool, to address the computational challenge. SLEMM builds on an efficient implementation of the stochastic Lanczos algorithm for REML in a framework of mixed models. We further implement SNP weighting in SLEMM to improve its predictions. Extensive analyses on seven public datasets, covering 19 polygenic traits in three plant and three livestock species, showed that SLEMM with SNP weighting had overall the best predictive ability among a variety of genomic prediction methods including GCTA's empirical BLUP, BayesR, KAML, and LDAK's BOLT and BayesR models. We also compared the methods using nine dairy traits of ∼300k genotyped cows. All had overall similar prediction accuracies, except that KAML failed to process the data. Additional simulation analyses on up to 3 million individuals and 1 million SNPs showed that SLEMM was advantageous over counterparts as for computational performance. Overall, SLEMM can do million-scale genomic predictions with an accuracy comparable to BayesR.
The software is available at https://github.com/jiang18/slemm.
基因组数据的数量正在呈指数级增长。使用大量经过基因分型和表型分析的个体进行基因组预测具有吸引力,但也具有挑战性。
我们提出了 SLEMM(Stochastic-Lanczos-Expedited Mixed Models 的简称),这是一种新的软件工具,用于解决计算挑战。SLEMM 建立在 REML 的随机 Lanczos 算法的高效实现基础上,并在混合模型框架中实现。我们进一步在 SLEMM 中实现 SNP 加权,以提高其预测能力。在七个公共数据集上进行的广泛分析涵盖了三个植物和三个牲畜物种的 19 个多基因性状,结果表明,在包括 GCTA 的经验 BLUP、BayesR、KAML 和 LDAK 的 BOLT 和 BayesR 模型在内的各种基因组预测方法中,带有 SNP 加权的 SLEMM 具有总体最佳的预测能力。我们还使用了约 30 万头经过基因分型的奶牛的九个奶牛性状比较了这些方法。所有方法的预测准确性总体上都相似,除了 KAML 无法处理数据。在多达 300 万个个体和 1000 万个 SNP 上进行的额外模拟分析表明,SLEMM 在计算性能方面优于其对应方法。总体而言,SLEMM 可以进行百万规模的基因组预测,并且与 BayesR 的准确性相当。
该软件可在 https://github.com/jiang18/slemm 上获得。