计算策略，用于在单步最佳线性无偏预测中整合表型、基因组和系谱数据的国家集成。

Computational strategies for national integration of phenotypic, genomic, and pedigree data in a single-step best linear unbiased prediction.

机构信息

INRA, UR 631 SAGA, F-31326 Castanet Tolosan, France.

出版信息

J Dairy Sci. 2012 Aug;95(8):4629-45. doi: 10.3168/jds.2011-4982.

DOI:10.3168/jds.2011-4982

Abstract

The single-step genomic BLUP (SSGBLUP) is a method that can integrate pedigree and genotypes at molecular markers in an optimal way. However, its present form (regular SSGBLUP) has a high computational cost (cubic in the number of genotyped animals) and may need extensive rewriting of genetic evaluation software. In this work, we propose several strategies to implement the single step in a simpler manner. The first one expands the single-step mixed-model equations to obtain equivalent equations from which the regular (including pedigree and records only) mixed-model equations are a subset. These new equations (unsymmetric extended SSGBLUP) have low computational cost, but require a nonsymmetric solver such as the biconjugate gradient stabilized method or successive underrelaxation, which is a variant of successive overrelaxation, with a relaxation factor lower than 1. In addition, we show a new derivation of the single-step method, which includes, as an extra effect, deviations from strictly polygenic breeding values. As a result, the same set of equations as above is obtained. We show that, whereas the new derivation shows apparent problems of nonpositive definiteness for certain covariance matrices, a proper equivalent model including imaginary effects always exists, leading always to the regular SSGBLUP mixed model equations. The system of equations can be solved (iterative SSGBLUP) by iterating between a pedigree and records evaluation and a genomic evaluation (each one solved by any iterative or direct method), whereas global iteration can use a block version of successive underrelaxation, which ensures convergence. The genomic evaluation can explicitly include marker or haplotype effects and possibly involve nonlinear (e.g., Bayesian by Markov chain Monte Carlo) methods. In a simulated example with 28,800 individuals and 1,800 genotyped individuals, all methods converged quickly to the same solutions. Using existing efficient methods with limited memory requirements to compute the products Gt and A(22)t for any t (where G and A(22) are genomic and pedigree relationships for genotyped animals, and t is a vector), all strategies can be converted to iteration on data procedures for which the total number of operations is linear in the number of animals + number of genotyped animals × number of markers.

摘要

一步法基因组最佳线性无偏预测（SSGBLUP）是一种可以最佳方式整合系谱和分子标记基因型的方法。然而，它目前的形式（常规 SSGBLUP）计算成本很高（数量级为已基因分型动物数量的立方），可能需要对遗传评估软件进行广泛重写。在这项工作中，我们提出了几种策略，以更简单的方式实现这一步骤。第一种方法扩展了一步法混合模型方程，得到了与常规（包括系谱和记录）混合模型方程等效的方程，后者是前者的子集。这些新方程（非对称扩展 SSGBLUP）计算成本低，但需要使用非对称求解器，如双共轭梯度稳定化法或逐次超松弛法，这是逐次松弛法的变体，松弛因子小于 1。此外，我们展示了一种新的一步法推导方法，该方法包括偏离严格多基因育种值的额外效应。结果，得到了与上述相同的一组方程。我们表明，尽管新的推导方法对于某些协方差矩阵明显存在非正定问题，但始终存在包括虚数效应的适当等效模型，始终导致常规 SSGBLUP 混合模型方程。可以通过在系谱和记录评估与基因组评估之间迭代（每一个都可以通过任何迭代或直接方法求解）来求解方程组（迭代 SSGBLUP），而全局迭代可以使用逐次超松弛的块版本，这可以保证收敛。基因组评估可以明确包括标记或单倍型效应，并且可能涉及非线性方法（例如，通过马尔可夫链蒙特卡罗的贝叶斯方法）。在一个包含 28800 个人和 1800 个基因分型个体的模拟示例中，所有方法都快速收敛到相同的解。使用现有的高效方法，只需有限的内存要求即可计算任何 t 时的 Gt 和 A(22)t 乘积（其中 G 和 A(22)是基因分型动物的基因组和系谱关系，t 是一个向量），所有策略都可以转换为基于数据的迭代过程，其中总操作数与动物数量+基因分型动物数量×标记数量呈线性关系。