Misztal I, Legarra A
1Department of Animal and Dairy Science,University of Georgia,Athens,GA 30602,USA.
2UMR1388 GenePhySE,INRA,Castanet Tolosan,31326,France.
Animal. 2017 May;11(5):731-736. doi: 10.1017/S1751731116002366. Epub 2016 Nov 21.
The purpose of this study is review and evaluation of computing methods used in genomic selection for animal breeding. Commonly used models include SNP BLUP with extensions (BayesA, etc), genomic BLUP (GBLUP) and single-step GBLUP (ssGBLUP). These models are applied for genomewide association studies (GWAS), genomic prediction and parameter estimation. Solving methods include finite Cholesky decomposition possibly with a sparse implementation, and iterative Gauss-Seidel (GS) or preconditioned conjugate gradient (PCG), the last two methods possibly with iteration on data. Details are provided that can drastically decrease some computations. For SNP BLUP especially with sampling and large number of SNP, the only choice is GS with iteration on data and adjustment of residuals. If only solutions are required, PCG by iteration on data is a clear choice. A genomic relationship matrix (GRM) has limited dimensionality due to small effective population size, resulting in infinite number of generalized inverses of GRM for large genotyped populations. A specific inverse called APY requires only a small fraction of GRM, is sparse and can be computed and stored at a low cost for millions of animals. With APY inverse and PCG iteration, GBLUP and ssGBLUP can be applied to any population. Both tools can be applied to GWAS. When the system of equations is sparse but contains dense blocks, a recently developed package for sparse Cholesky decomposition and sparse inversion called YAMS has greatly improved performance over packages where such blocks were treated as sparse. With YAMS, GREML and possibly single-step GREML can be applied to populations with >50 000 genotyped animals. From a computational perspective, genomic selection is becoming a mature methodology.
本研究的目的是回顾和评估用于动物育种基因组选择的计算方法。常用模型包括扩展的单核苷酸多态性最佳线性无偏预测(SNP BLUP,如贝叶斯A等)、基因组最佳线性无偏预测(GBLUP)和单步基因组最佳线性无偏预测(ssGBLUP)。这些模型应用于全基因组关联研究(GWAS)、基因组预测和参数估计。求解方法包括可能采用稀疏实现的有限乔列斯基分解,以及迭代高斯-赛德尔(GS)或预处理共轭梯度(PCG),后两种方法可能对数据进行迭代。文中提供了一些细节,可大幅减少某些计算量。对于SNP BLUP,尤其是在抽样和大量单核苷酸多态性的情况下,唯一的选择是对数据进行迭代并调整残差的GS方法。如果只需要解,对数据进行迭代的PCG是一个明确的选择。由于有效种群规模较小,基因组关系矩阵(GRM)的维度有限,导致对于大型基因型群体,GRM有无数个广义逆矩阵。一种称为APY的特定逆矩阵只需要GRM的一小部分,是稀疏的,并且可以以低成本为数百万只动物进行计算和存储。使用APY逆矩阵和PCG迭代,GBLUP和ssGBLUP可应用于任何群体。这两种工具都可应用于GWAS。当方程组稀疏但包含密集块时,最近开发的一个用于稀疏乔列斯基分解和稀疏求逆的软件包YAMS,其性能比将此类块视为稀疏的软件包有了很大提高。使用YAMS,基因组限制最大似然法(GREML)以及可能的单步GREML可应用于基因型动物超过50000只的群体。从计算角度来看,基因组选择正成为一种成熟的方法。