基因组信息的维度及其对基因组预测的影响。

The Dimensionality of Genomic Information and Its Effect on Genomic Prediction.

作者信息

Pocrnic Ivan, Lourenco Daniela A L, Masuda Yutaka, Legarra Andres, Misztal Ignacy

机构信息

Department of Animal and Dairy Science, University of Georgia, Athens, Georgia 30602

Department of Animal and Dairy Science, University of Georgia, Athens, Georgia 30602.

出版信息

Genetics. 2016 May;203(1):573-81. doi: 10.1534/genetics.116.187013. Epub 2016 Mar 4.

The genomic relationship matrix (GRM) can be inverted by the algorithm for proven and young (APY) based on recursion on a random subset of animals. While a regular inverse has a cubic cost, the cost of the APY inverse can be close to linear. Theory for the APY assumes that the optimal size of the subset (maximizing accuracy of genomic predictions) is due to a limited dimensionality of the GRM, which is a function of the effective population size (Ne). The objective of this study was to evaluate these assumptions by simulation. Six populations were simulated with approximate effective population size (Ne) from 20 to 200. Each population consisted of 10 nonoverlapping generations, with 25,000 animals per generation and phenotypes available for generations 1-9. The last 3 generations were fully genotyped assuming genome length L = 30. The GRM was constructed for each population and analyzed for distribution of eigenvalues. Genomic estimated breeding values (GEBV) were computed by single-step GBLUP, using either a direct or an APY inverse of GRM. The sizes of the subset in APY were set to the number of the largest eigenvalues explaining x% of variation (EIGx, x = 90, 95, 98, 99) in GRM. Accuracies of GEBV for the last generation with the APY inverse peaked at EIG98 and were slightly lower with EIG95, EIG99, or the direct inverse. Most information in the GRM is contained in ∼NeL largest eigenvalues, with no information beyond 4NeL Genomic predictions with the APY inverse of the GRM are more accurate than by the regular inverse.

基因组关系矩阵（GRM）可以通过基于对动物随机子集的递归的经证明和年轻个体算法（APY）求逆。虽然常规求逆的计算量是立方级的，但APY求逆的计算量可以接近线性。APY理论假设子集的最优大小（使基因组预测准确性最大化）是由于GRM的维度有限，而GRM的维度是有效种群大小（Ne）的函数。本研究的目的是通过模拟评估这些假设。模拟了六个有效种群大小（Ne）近似从20到200的群体。每个群体由10个不重叠的世代组成，每代有25000只动物，并且第1 - 9代有表型数据。假设基因组长度L = 30，对最后三代进行全基因组分型。为每个群体构建GRM并分析其特征值分布。基因组估计育种值（GEBV）通过单步GBLUP计算，使用GRM的直接求逆或APY求逆。APY中子集的大小设置为解释GRM中x%变异的最大特征值数量（EIGx，x = 90、95、98、99）。使用APY求逆时，最后一代GEBV的准确性在EIG98时达到峰值，在EIG95、EIG99或直接求逆时略低。GRM中的大部分信息包含在约NeL个最大特征值中，4NeL之外没有信息。使用GRM的APY求逆进行基因组预测比常规求逆更准确。