Moutet Jordan, Rivals Eric, Pardi Fabio
LIRMM, Université de Montpellier, CNRS, Montpellier, France.
PLoS Comput Biol. 2025 Jul 28;21(7):e1012585. doi: 10.1371/journal.pcbi.1012585. eCollection 2025 Jul.
Ancestral sequence reconstruction is an important task in bioinformatics, with applications ranging from protein engineering to the study of genome evolution. When sequences can only undergo substitutions, optimal reconstructions can be efficiently computed using well-known algorithms. However, accounting for indels in ancestral reconstructions is much harder. First, for biologically-relevant problem formulations, no polynomial-time exact algorithms are available. Second, multiple reconstructions are often equally parsimonious or likely, making it crucial to correctly display uncertainty in the results. Here, we consider a parsimony approach where only deletions are allowed, while addressing the aforementioned limitations. First, we describe an exact algorithm to obtain all the optimal solutions. The algorithm runs in polynomial time if only one solution is sought. Second, we show that all possible optimal reconstructions for a fixed node can be represented using a graph computable in polynomial time. While previous studies have proposed graph-based representations of ancestral reconstructions, this result is the first to offer a solid mathematical justification for this approach. Finally we provide arguments for the relevance of the deletion-only case for the general case.
祖先序列重建是生物信息学中的一项重要任务,其应用范围从蛋白质工程到基因组进化研究。当序列仅能发生替换时,可以使用知名算法高效地计算出最优重建。然而,在祖先重建中考虑插入缺失则要困难得多。首先,对于与生物学相关的问题表述,不存在多项式时间的精确算法。其次,多个重建通常同样简约或具有同等可能性,因此在结果中正确显示不确定性至关重要。在此,我们考虑一种仅允许删除的简约方法,同时解决上述局限性。首先,我们描述一种精确算法以获取所有最优解。如果只寻求一个解,该算法在多项式时间内运行。其次,我们表明固定节点的所有可能最优重建可以用一个能在多项式时间内计算的图来表示。虽然先前的研究已经提出了基于图的祖先重建表示方法,但这一结果首次为这种方法提供了坚实的数学依据。最后,我们阐述了仅删除情况对于一般情况的相关性。