Hampel Alexander, Teuscher Friedrich, Gomez-Raya Luis, Doschoris Michael, Wittenburg Dörte
Leibniz Institute for Farm Animal Biology (FBN), Institute of Genetics and Biometry, Dummerstorf, Germany.
Departamento de Mejora Genética Animal, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain.
Front Genet. 2018 Jun 5;9:186. doi: 10.3389/fgene.2018.00186. eCollection 2018.
A livestock population can be characterized by different population genetic parameters, such as linkage disequilibrium and recombination rate between pairs of genetic markers. The population structure, which may be caused by family stratification, has an influence on the estimates of these parameters. An expectation maximization algorithm has been proposed for estimating these parameters in half-sibs without phasing the progeny. It, however, overlooks the fact that the underlying likelihood function may have two maxima. The magnitudes of the maxima depend on the maternal allele frequencies at the investigated marker pair. Which maximum the algorithm converges to depends on the chosen start values. We present a stepwise procedure in which the relationship between the two modes is exploited. The expectation maximization algorithm for the parameter estimation is applied twice using different start values, followed by a decision process to assess the most likely estimate. This approach was validated using simulated genotypes of half-sibs. It was also applied to a dairy cattle dataset consisting of multiple half-sib families and 39,780 marker genotypes, leading to estimates for 12,759,713 intrachromosomal marker pairs. Furthermore, the proper order of markers was verified by studying the mean of estimated recombination rates in a window adjacent to the investigated locus as well as in a window at its most distant chromosome end. Putatively misplaced markers or marker clusters were detected by comparing the results with the revised bovine genome assembly UMD 3.1.1. In total, 40 markers were identified as candidates of misplacement. This outcome may help improving the physical order of markers which is also required for refining the bovine genetic map.
一个家畜群体可以通过不同的群体遗传参数来表征,例如遗传标记对之间的连锁不平衡和重组率。群体结构可能由家系分层引起,它会影响这些参数的估计。已经提出了一种期望最大化算法来估计半同胞中的这些参数,而无需对后代进行定相。然而,它忽略了潜在的似然函数可能有两个最大值这一事实。最大值的大小取决于所研究标记对处的母本等位基因频率。算法收敛到哪个最大值取决于所选择的起始值。我们提出了一种逐步程序,其中利用了两种模式之间的关系。用于参数估计的期望最大化算法使用不同的起始值应用两次,然后进行决策过程以评估最可能的估计。这种方法通过半同胞的模拟基因型进行了验证。它还应用于一个由多个半同胞家系和39780个标记基因型组成的奶牛数据集,从而得到了12759713个染色体内标记对的估计值。此外,通过研究与所研究位点相邻的窗口以及其最远端染色体末端的窗口中估计的重组率的平均值,验证了标记的正确顺序。通过将结果与修订后的牛基因组组装UMD 3.1.1进行比较,检测到了可能错误放置的标记或标记簇。总共鉴定出40个标记作为错误放置的候选者。这一结果可能有助于改善标记的物理顺序,这也是完善牛遗传图谱所必需的。