IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1364-1373. doi: 10.1109/TCBB.2017.2661761. Epub 2017 Jan 31.
Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can, e.g., be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders. An implementation is available at http://github.com/nluhmann/PhySca.
在给定的系统发育树中重建祖先基因顺序是比较基因组学中的一个经典问题。大多数现有的方法比较系统发育树中现存基因组中的保守特征,以定义潜在的祖先基因邻接,并试图在全局进化简约准则下重建所有祖先基因组,或者,专注于单个祖先基因组,使用支架方法选择祖先基因邻接的子集,通常旨在减少重建祖先基因组的碎片化。在本文中,我们描述了一种用于小简约问题的精确算法,该算法结合了这两种方法。我们考虑物种系统发育树内部节点的基因邻接是加权的,并引入了一个目标函数,该函数是这些权重和单切或连接 (SCJ) 模型下的进化成本的凸组合。例如,祖先基因邻接的权重可以通过最近获得的古代 DNA 测序数据获得,这些数据直接提示了所考虑的祖先的基因组结构,或者通过基因邻接进化的概率分析获得。我们证明了我们的问题变体的 NP 难度,并提出了一种基于 Sankoff-Rousseau 动态规划算法的固定参数可解算法,该算法还允许对最优解进行采样。我们将我们的方法应用于哺乳动物和细菌数据,提供了不同程度的复杂性。我们表明,在目标中包含邻接权重对减少重建祖先基因顺序的碎片化有显著影响。一个实现可在 http://github.com/nluhmann/PhySca 上获得。