技术说明：基因组预测等效计算算法的推导及动物遗传价值的可靠性

Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit.

作者信息

Strandén I, Garrick D J

机构信息

MTT Agrifood Research Finland, FIN-31600 Jokioinen, Finland.

出版信息

J Dairy Sci. 2009 Jun;92(6):2971-5. doi: 10.3168/jds.2008-1929.

DOI:10.3168/jds.2008-1929

PMID:19448030

Abstract

Conventional prediction of dairy cattle merit involves setting up and solving linear equations with the number of unknowns being the number of animals, typically millions, multiplied by the number of traits being simultaneously assessed. The coefficient matrix has been large and sparse and iteration on data has been the method of choice, whereby the coefficient matrix is not stored but recreated as needed. In contrast, genomic prediction involves assessment of the merit of genome fragments characterized by single nucleotide polymorphism genotypes, currently some 50,000, which can then be used to predict the merit of individual animals according to the fragments they have inherited. The prediction equations for chromosome fragments typically have fewer than 100,000 unknowns, but the number of observations used to predict the fragment effects can be one-tenth the number of fragments. The coefficient matrix tends to be dense and the resulting system of equations can be ill behaved. Equivalent computing algorithms for genomic prediction were derived. The number of unknowns in the equivalent system grows with number of genotyped animals, usually bulls, rather than the number of chromosome fragment effects. In circumstances with fewer genotyped animals than single nucleotide polymorphism genotypes, these equivalent computations allow the solving of a smaller system of equations that behaves numerically better. There were 3 solving strategies compared: 1 method that formed and stored the coefficient matrix in memory and 2 methods that iterate on data. Finally, formulas for reliabilities of genomic predictions of merit were developed.

摘要

传统的奶牛优良性状预测需要建立并求解线性方程，未知数的数量是动物数量（通常有数百万）乘以同时评估的性状数量。系数矩阵规模大且稀疏，对数据进行迭代是首选方法，即不存储系数矩阵，而是根据需要重新创建。相比之下，基因组预测涉及评估以单核苷酸多态性基因型为特征的基因组片段的优良性状，目前约有5万个，然后可根据个体动物继承的片段来预测其优良性状。染色体片段的预测方程通常未知数少于10万个，但用于预测片段效应的观测值数量可能是片段数量的十分之一。系数矩阵往往密集，由此产生的方程组可能表现不佳。推导了基因组预测的等效计算算法。等效系统中的未知数数量随基因分型动物（通常是公牛）的数量增加，而不是随染色体片段效应的数量增加。在基因分型动物数量少于单核苷酸多态性基因型数量的情况下，这些等效计算允许求解一个数值表现更好的较小方程组。比较了3种求解策略：1种在内存中形成并存储系数矩阵的方法和2种对数据进行迭代的方法。最后，推导了基因组优良性状预测可靠性的公式。