Kirkpatrick Bonnie, Reshef Yakir, Finucane Hilary, Jiang Haitao, Zhu Binhai, Karp Richard M
Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA.
J Comput Biol. 2012 Sep;19(9):998-1014. doi: 10.1089/cmb.2011.0254. Epub 2012 Aug 16.
Pedigree graphs, or family trees, are typically constructed by an expensive process of examining genealogical records to determine which pairs of individuals are parent and child. New methods to automate this process take as input genetic data from a set of extant individuals and reconstruct ancestral individuals. There is a great need to evaluate the quality of these methods by comparing the estimated pedigree to the true pedigree. In this article, we consider two main pedigree comparison problems. The first is the pedigree isomorphism problem, for which we present a linear-time algorithm for leaf-labeled pedigrees. The second is the pedigree edit distance problem, for which we present (1) several algorithms that are fast and exact in various special cases, and (2) a general, randomized heuristic algorithm. In the negative direction, we first prove that the pedigree isomorphism problem is as hard as the general graph isomorphism problem, and that the sub-pedigree isomorphism problem is NP-hard. We then show that the pedigree edit distance problem is APX-hard in general and NP-hard on leaf-labeled pedigrees. We use simulated pedigrees to compare our edit-distance algorithms to each other as well as to a branch-and-bound algorithm that always finds an optimal solution.
系谱图,即家族树,通常通过一个昂贵的过程来构建,该过程需查阅族谱记录以确定哪些个体对是父母与子女关系。自动化此过程的新方法将一组现存个体的遗传数据作为输入,并重建祖先个体。通过将估计的系谱与真实系谱进行比较来评估这些方法的质量非常有必要。在本文中,我们考虑两个主要的系谱比较问题。第一个是系谱同构问题,对于该问题,我们提出了一种针对叶标记系谱的线性时间算法。第二个是系谱编辑距离问题,对于该问题,我们提出了(1)在各种特殊情况下快速且精确的几种算法,以及(2)一种通用的随机启发式算法。在负面结果方面,我们首先证明系谱同构问题与一般图同构问题一样困难,并且子系谱同构问题是NP难的。然后我们表明,系谱编辑距离问题总体上是APX难的,并且在叶标记系谱上是NP难的。我们使用模拟系谱将我们的编辑距离算法相互比较,以及与总是找到最优解的分支定界算法进行比较。