Dexter Daniel, Brown Daniel G
David R Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada.
Algorithms Mol Biol. 2013 Jul 12;8(1):20. doi: 10.1186/1748-7188-8-20.
Kinship inference is the task of identifying genealogically related individuals. Kinship information is important for determining mating structures, notably in endangered populations. Although many solutions exist for reconstructing full sibling relationships, few exist for half-siblings.
We consider the problem of determining whether a proposed half-sibling population reconstruction is valid under Mendelian inheritance assumptions. We show that this problem is NP-complete and provide a 0/1 integer program that identifies the minimum number of individuals that must be removed from a population in order for the reconstruction to become valid. We also present SibJoin, a heuristic-based clustering approach based on Mendelian genetics, which is strikingly fast. The software is available at http://github.com/ddexter/SibJoin.git+.
Our SibJoin algorithm is reasonably accurate and thousands of times faster than existing algorithms. The heuristic is used to infer a half-sibling structure for a population which was, until recently, too large to evaluate.
亲属关系推断是识别有血缘关系个体的任务。亲属关系信息对于确定交配结构很重要,特别是在濒危种群中。虽然存在许多用于重建全同胞关系的解决方案,但用于半同胞关系的却很少。
我们考虑在孟德尔遗传假设下确定提议的半同胞种群重建是否有效的问题。我们表明这个问题是NP完全问题,并提供了一个0/1整数规划,该规划可确定为使重建有效必须从种群中移除的最少个体数量。我们还提出了SibJoin,一种基于孟德尔遗传学的启发式聚类方法,它速度极快。该软件可在http://github.com/ddexter/SibJoin.git+获取。
我们的SibJoin算法相当准确,并且比现有算法快数千倍。该启发式方法用于推断一个种群的半同胞结构,直到最近这个种群规模太大而无法评估。