Department of Computer Science, University of North Carolina at Chapel Hill, USA.
Bioinformatics. 2010 Jun 15;26(12):i199-207. doi: 10.1093/bioinformatics/btq187.
High-density SNP data of model animal resources provides opportunities for fine-resolution genetic variation studies. These genetic resources are generated through a variety of breeding schemes that involve multiple generations of matings derived from a set of founder animals. In this article, we investigate the problem of inferring the most probable ancestry of resulting genotypes, given a set of founder genotypes. Due to computational difficulty, existing methods either handle only small pedigree data or disregard the pedigree structure. However, large pedigrees of model animal resources often contain repetitive substructures that can be utilized in accelerating computation.
We present an accurate and efficient method that can accept complex pedigrees with inbreeding in inferring genome ancestry. Inbreeding is a commonly used process in generating genetically diverse and reproducible animals. It is often carried out for many generations and can account for most of the computational complexity in real-world model animal pedigrees. Our method builds a hidden Markov model that derives the ancestry probabilities through inbreeding process without explicit modeling in every generation. The ancestry inference is accurate and fast, independent of the number of generations, for model animal resources such as the Collaborative Cross (CC). Experiments on both simulated and real CC data demonstrate that our method offers comparable accuracy to those methods that build an explicit model of the entire pedigree, but much better scalability with respect to the pedigree size.
模型动物资源的高密度 SNP 数据为精细分辨率遗传变异研究提供了机会。这些遗传资源是通过多种繁殖方案产生的,涉及来自一组创始动物的多代交配。在本文中,我们研究了在给定一组创始基因型的情况下,推断出结果基因型最可能的祖先的问题。由于计算困难,现有的方法要么只能处理小的系谱数据,要么忽略系谱结构。然而,模型动物资源的大系谱通常包含可用于加速计算的重复子结构。
我们提出了一种准确有效的方法,可以接受具有近亲繁殖的复杂系谱,以推断基因组的祖先。近亲繁殖是在产生遗传多样性和可繁殖动物时常用的过程。它通常进行多代,并且可以解释大多数真实世界模型动物系谱中的计算复杂度。我们的方法构建了一个隐马尔可夫模型,通过近亲繁殖过程而无需在每一代中进行显式建模来推导出祖先概率。对于像协作交叉(CC)这样的模型动物资源,祖先推断既准确又快速,与世代数无关。对模拟和真实 CC 数据的实验表明,我们的方法与构建整个系谱的显式模型的方法具有可比性,但在系谱大小方面具有更好的可扩展性。