Suppr
超能文献

针对一步法 GBLUP 中已证明和年轻算法的核心群体的大小和定义进行全面研究。

A comprehensive study on size and definition of the core group in the proven and young algorithm for single-step GBLUP.

机构信息

Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.

出版信息

Genet Sel Evol. 2022 May 20;54(1):34. doi: 10.1186/s12711-022-00726-6.

DOI:10.1186/s12711-022-00726-6

PMID:35596130

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9123737/

Abstract

BACKGROUND

The algorithm for proven and young (APY) has been suggested as a solution for recursively computing a sparse representation for the inverse of a large genomic relationship matrix (G). In APY, a subset of genotyped individuals is used as the core and the remaining genotyped individuals are used as noncore. Size and definition of the core are relevant research subjects for the application of APY, especially given the ever-increasing number of genotyped individuals.

METHODS

The aim of this study was to investigate several core definitions, including the most popular animals (MPA) (i.e., animals with high contributions to the genetic pool), the least popular males (LPM), the least popular females (LPF), a random set (Rnd), animals evenly distributed across genealogical paths (Ped), unrelated individuals (Unrel), or based on within-family selection (Fam), or on decomposition of the gene content matrix (QR). Each definition was evaluated for six core sizes based on prediction accuracy of single-step genomic best linear unbiased prediction (ssGBLUP) with APY. Prediction accuracy of ssGBLUP with the full inverse of G was used as the baseline. The dataset consisted of 357k pedigreed Duroc pigs with 111k pigs with genotypes and ~ 220k phenotypic records.

RESULTS

When the core size was equal to the number of largest eigenvalues explaining 50% of the variation of G (n = 160), MPA and Ped core definitions delivered the highest average prediction accuracies (~ 0.41-0.53). As the core size increased to the number of eigenvalues explaining 99% of the variation in G (n = 7320), prediction accuracy was nearly identical for all core types and correlations with genomic estimated breeding values (GEBV) from ssGBLUP with the full inversion of G were greater than 0.99 for all core definitions. Cores that represent all generations, such as Rnd, Ped, Fam, and Unrel, were grouped together in the hierarchical clustering of GEBV.

CONCLUSIONS

For small core sizes, the definition of the core matters; however, as the size of the core reaches an optimal value equal to the number of largest eigenvalues explaining 99% of the variation of G, the definition of the core becomes arbitrary.

摘要

背景

已提出 APY（已证明和年轻）算法，作为递归计算大型基因组关系矩阵（G）逆的稀疏表示的解决方案。在 APY 中，使用一部分已基因型个体作为核心，其余已基因型个体作为非核心。核心的大小和定义是 APY 应用的相关研究课题，尤其是考虑到已基因型个体数量的不断增加。

方法

本研究的目的是研究几种核心定义，包括最受欢迎的动物（MPA）（即对遗传库有高贡献的动物）、最不受欢迎的雄性（LPM）、最不受欢迎的雌性（LPF）、随机集（Rnd）、沿系谱路径均匀分布的动物（Ped）、无关个体（Unrel）或基于家系内选择（Fam）、或基于基因内容矩阵分解（QR）。根据 APY 的单步基因组最佳线性无偏预测（ssGBLUP）的预测准确性，对六种核心大小的每种定义进行了评估。使用 G 的完全逆的 ssGBLUP 的预测准确性作为基线。该数据集由 357k 头杜洛克猪组成，其中 111k 头具有基因型，约 220k 头具有表型记录。

结果

当核心大小等于解释 G 变异的前 50%的最大特征值的数量（n=160）时，MPA 和 Ped 核心定义提供了最高的平均预测准确性（~0.41-0.53）。随着核心大小增加到解释 G 变异的 99%的特征值数量（n=7320），所有核心类型的预测准确性几乎相同，与 G 的完全反转的 ssGBLUP 的基因组估计育种值（GEBV）的相关性大于 0.99 所有核心定义。代表所有世代的核心，如 Rnd、Ped、Fam 和 Unrel，在 GEBV 的层次聚类中被分组在一起。