Unit of Animal Genomics, Faculty of Veterinary Medicine and Centre for Biomedical Integrative Genoproteomics, University of Liège, Liège, Belgium.
Genetics. 2011 Jun;188(2):409-19. doi: 10.1534/genetics.111.127720. Epub 2011 Mar 24.
Identity-by-descent probabilities are important for many applications in genetics. Here we propose a method for modeling the transmission of the haplotypes from the closest genotyped relatives along an entire chromosome. The method relies on a hidden Markov model where hidden states correspond to the set of all possible origins of a haplotype within a given pedigree. Initial state probabilities are estimated from average genetic contribution of each origin to the modeled haplotype while transition probabilities are computed from recombination probabilities and pedigree relationships between the modeled haplotype and the various possible origins. The method was tested on three simulated scenarios based on real data sets from dairy cattle, Arabidopsis thaliana, and maize. The mean identity-by-descent probabilities estimated for the truly inherited parental chromosome ranged from 0.94 to 0.98 according to the design and the marker density. The lowest values were observed in regions close to crossing over or where the method was not able to discriminate between several origins due to their similarity. It is shown that the estimated probabilities were correctly calibrated. For marker imputation (or QTL allele prediction for fine mapping or genomic selection), the method was efficient, with 3.75% allelic imputation error rates on a dairy cattle data set with a low marker density map (1 SNP/Mb). The method should prove useful for situations we are facing now in experimental designs and in plant and animal breeding, where founders are genotyped with relatively high markers densities and last generation(s) genotyped with a lower-density panel.
个体遗传关系概率在遗传学的许多应用中都非常重要。在这里,我们提出了一种方法,可以对整个染色体上最亲近的已基因分型亲属的单倍型传递进行建模。该方法依赖于一个隐马尔可夫模型,其中隐藏状态对应于给定家系中单倍型的所有可能起源的集合。初始状态概率是根据每个起源对所建模单倍型的平均遗传贡献来估计的,而转移概率是根据重组概率和所建模单倍型与各种可能起源之间的家系关系来计算的。该方法在基于奶牛、拟南芥和玉米的真实数据集的三个模拟场景中进行了测试。根据设计和标记密度,估计真正遗传的亲本染色体的个体遗传关系概率从 0.94 到 0.98 不等。在靠近交叉或由于相似性而无法区分多个起源的区域观察到最低值。结果表明,估计的概率是正确校准的。对于标记(或 QTL 等位基因预测用于精细定位或基因组选择)的导入,该方法效率很高,在奶牛数据集上的标记密度图(1 SNP/Mb)较低的情况下,等位基因导入错误率为 3.75%。该方法应该在实验设计以及植物和动物育种中我们目前面临的情况中证明是有用的,在这些情况下,创始人用相对较高密度的标记进行基因分型,最后一代用较低密度的面板进行基因分型。