Bou Dagher Léa, Madern Dominique, Malbos Philippe, Brochier-Armanet Céline
Universite Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, Villeurbanne F-69622, France.
Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, Villeurbanne F-69622, France.
Mol Biol Evol. 2025 Feb 3;42(2). doi: 10.1093/molbev/msae271.
Phylogenetic inference is mainly based on sequence analysis and requires reliable alignments. This can be challenging, especially when sequences are highly divergent. In this context, the use of three-dimensional protein structures is a promising alternative. In a recent study, we introduced an original topological data analysis method based on persistent homology to estimate the evolutionary distances from structures. The method was successfully tested on 518 protein families representing 22,940 predicted structures. However, as anticipated, the reliability of the estimated evolutionary distances was impacted by the quality of the predicted structures and the presence of indels in the proteins. This paper introduces a new topological descriptor, called bio-topological marker (BTM), which provides a more faithful description of the structures, a topological analysis for estimating evolutionary distances from BTMs, and a new weight-filtering method adapted to protein structures. These new developments significantly improve the estimation of evolutionary distances and phylogenies inferred from structures.
系统发育推断主要基于序列分析,需要可靠的比对。这可能具有挑战性,尤其是当序列高度分化时。在这种情况下,使用三维蛋白质结构是一种很有前景的替代方法。在最近的一项研究中,我们引入了一种基于持久同调的原始拓扑数据分析方法,以从结构中估计进化距离。该方法在代表22940个预测结构的518个蛋白质家族上成功进行了测试。然而,正如预期的那样,估计的进化距离的可靠性受到预测结构的质量和蛋白质中插入缺失的影响。本文介绍了一种新的拓扑描述符,称为生物拓扑标记(BTM),它提供了对结构更忠实的描述、一种用于从BTM估计进化距离的拓扑分析,以及一种适用于蛋白质结构的新权重过滤方法。这些新进展显著改进了从结构推断的进化距离和系统发育的估计。