He Dan, Wang Zhanyong, Han Buhm, Parida Laxmi, Eskin Eleazar
1 IBM T.J. Watson Research , Yorktown Heights, New York.
J Comput Biol. 2013 Oct;20(10):780-91. doi: 10.1089/cmb.2013.0080.
The problem of inference of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. Various methods have been proposed to automate the process of pedigree reconstruction given the genotypes or haplotypes of a set of individuals. Current methods, unfortunately, are very time-consuming and inaccurate for complicated pedigrees, such as pedigrees with inbreeding. In this work, we propose an efficient algorithm that is able to reconstruct large pedigrees with reasonable accuracy. Our algorithm reconstructs the pedigrees generation by generation, backward in time from the extant generation. We predict the relationships between individuals in the same generation using an inheritance path-based approach implemented with an efficient dynamic programming algorithm. Experiments show that our algorithm runs in linear time with respect to the number of reconstructed generations, and therefore, it can reconstruct pedigrees that have a large number of generations. Indeed it is the first practical method for reconstruction of large pedigrees from genotype data.
对于一组个体而言,推断家族谱系或进行系谱重建的问题是遗传学中的一个基本问题。针对给定一组个体的基因型或单倍型,已经提出了各种方法来实现系谱重建过程的自动化。不幸的是,当前的方法对于复杂的谱系(例如存在近亲繁殖的谱系)非常耗时且不准确。在这项工作中,我们提出了一种高效算法,该算法能够以合理的准确性重建大型谱系。我们的算法逐代重建谱系,从现存世代开始逆向追溯时间。我们使用基于继承路径的方法,并通过高效的动态规划算法来预测同一代个体之间的关系。实验表明,我们的算法运行时间与重建世代数成线性关系,因此,它可以重建具有大量世代的谱系。实际上,它是第一种从基因型数据重建大型谱系的实用方法。