Nesterenko Luca, Blassel Luc, Veber Philippe, Boussau Bastien, Jacob Laurent
Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, Villeurbanne, France.
Laboratory of Computational and Quantitative Biology, Sorbonne Université, Paris, France.
Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf051.
Phylogenetic inference aims at reconstructing the tree describing the evolution of a set of sequences descending from a common ancestor. The high computational cost of state-of-the-art maximum likelihood and Bayesian inference methods limits their usability under realistic evolutionary models. Harnessing recent advances in likelihood-free inference and geometric deep learning, we introduce Phyloformer, a fast and accurate method for evolutionary distance estimation and phylogenetic reconstruction. Sampling many trees and sequences under an evolutionary model, we train the network to learn a function that enables predicting a tree from a multiple sequence alignment. On simulated data, we compare Phyloformer to FastME-a distance method-and two maximum likelihood methods: FastTree and IQTree. Under a commonly used model of protein sequence evolution and exploiting graphics processing unit (GPU) acceleration, Phyloformer outpaces all other approaches and exceeds their accuracy in the Kuhner-Felsenstein metric that accounts for both the topology and branch lengths. In terms of topological accuracy alone, Phyloformer outperforms FastME, but falls behind maximum likelihood approaches, especially as the number of sequences increases. When a model of sequence evolution that includes dependencies between sites is used, Phyloformer outperforms all other methods across all metrics on alignments with fewer than 80 sequences. On 3,801 empirical gene alignments from five different datasets, Phyloformer matches the topological accuracy of the two maximum likelihood implementations. Our results pave the way for the adoption of sophisticated realistic models for phylogenetic inference.
系统发育推断旨在重建描述一组从共同祖先演化而来的序列进化过程的树。最先进的最大似然法和贝叶斯推断法计算成本高昂,限制了它们在实际进化模型下的可用性。利用无似然推断和几何深度学习的最新进展,我们引入了Phyloformer,这是一种用于进化距离估计和系统发育重建的快速且准确的方法。我们在进化模型下对许多树和序列进行采样,训练网络学习一个函数,该函数能够根据多序列比对预测一棵树。在模拟数据上,我们将Phyloformer与一种距离法FastME以及两种最大似然法FastTree和IQTree进行比较。在常用的蛋白质序列进化模型下,并利用图形处理单元(GPU)加速,Phyloformer超过了所有其他方法,并且在考虑拓扑结构和分支长度的Kuhner-Felsenstein度量中超过了它们的准确性。仅就拓扑准确性而言,Phyloformer优于FastME,但落后于最大似然法,尤其是随着序列数量的增加。当使用包含位点间依赖性的序列进化模型时,在少于80个序列的比对中,Phyloformer在所有度量上均优于所有其他方法。在来自五个不同数据集的3801个经验基因比对上,Phyloformer与两种最大似然法实现的拓扑准确性相当。我们的结果为采用复杂的实际模型进行系统发育推断铺平了道路。