Elsen Jean-Michel
GenPhySE (Génétique, Physiologie et Systèmes d'Elevage), INRA, 31326, Castanet-Tolosan, France.
Animal Genetics and Breeding Unit, University of New England, Armidale, Australia.
Genet Sel Evol. 2016 Mar 3;48:18. doi: 10.1186/s12711-016-0183-3.
Genomic selection is still to be evaluated and optimized in many species. Mathematical modeling of selection schemes prior to their implementation is a classical and useful tool for that purpose. These models include formalization of a number of entities including the precision of the estimated breeding value. To model genomic selection schemes, equations that predict this reliability as a function of factors such as the size of the reference population, its diversity, its genetic distance from the group of selection candidates genotyped, number of markers and strength of linkage disequilibrium are needed. The present paper aims at exploring new approximations of this reliability.
Two alternative approximations are proposed for the estimation of the reliability of genomic estimated breeding values (GEBV) in the case of non-independence between candidate and reference populations. Both were derived from the Taylor series heuristic approach suggested by Goddard in 2009. A numerical exploration of their properties showed that the series were not equivalent in terms of convergence to the exact reliability, that the approximations may overestimate the precision of GEBV and that they converged towards their theoretical expectations. Formulae derived for these approximations were simple to handle in the case of independent markers. A few parameters that describe the markers' genotypic variability (allele frequencies, linkage disequilibrium) can be estimated from genomic data corresponding to the population of interest or after making assumptions about their distribution. When markers are not in linkage equilibrium, replacing the real number of markers and QTL by the "effective number of independent loci", as proposed earlier is a practical solution. In this paper, we considered an alternative, i.e. an "equivalent number of independent loci" which would give a GEBV reliability for unrelated individuals by considering a sub-set of independent markers that is identical to the reliability obtained by considering the full set of markers.
This paper is a further step towards the development of deterministic models that describe breeding plans based on the use of genomic information. Such deterministic models carry low computational burden, which allows design optimization through intensive numerical exploration.
基因组选择在许多物种中仍有待评估和优化。在实施选择方案之前对其进行数学建模是实现这一目标的经典且有用的工具。这些模型包括对许多实体的形式化,包括估计育种值的精度。为了对基因组选择方案进行建模,需要能够根据诸如参考群体的大小、其多样性、与进行基因分型的选择候选群体的遗传距离、标记数量和连锁不平衡强度等因素来预测这种可靠性的方程。本文旨在探索这种可靠性的新近似值。
针对候选群体和参考群体非独立情况下基因组估计育种值(GEBV)可靠性的估计,提出了两种替代近似方法。两者均源自戈达德在2009年提出的泰勒级数启发式方法。对其性质的数值探索表明,这些级数在收敛到精确可靠性方面并不等效,这些近似值可能高估了GEBV的精度,并且它们趋向于其理论预期。在独立标记的情况下,为这些近似值推导的公式易于处理。一些描述标记基因型变异性的参数(等位基因频率、连锁不平衡)可以从与感兴趣群体对应的基因组数据中估计出来,或者在对其分布做出假设之后进行估计。当标记不处于连锁平衡时,如之前所提议的,用“独立位点的有效数量”代替实际的标记数量和数量性状位点是一种实际的解决方案。在本文中,我们考虑了另一种方法,即“独立位点的等效数量”,通过考虑一个独立标记子集,该子集将为无关个体提供与考虑全套标记所获得的可靠性相同的GEBV可靠性。
本文朝着基于基因组信息使用来描述育种计划的确定性模型的发展又迈进了一步。这种确定性模型计算负担低,这使得能够通过密集的数值探索来优化设计。