Suppr超能文献

在重建系统发育树和确定四个分类群对齐时的最佳进化模型方面,机器学习可以与最大似然法一样好。

Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments.

机构信息

Molecular Evolutionary Biology, Department of Biology, Hamburg University, Germany; Leibniz Institute for the Analysis of Biodiversity Change (LIB), Germany.

Leibniz Institute for the Analysis of Biodiversity Change (LIB), Germany; Medical Faculty, Heidelberg University, Germany.

出版信息

Mol Phylogenet Evol. 2024 Nov;200:108181. doi: 10.1016/j.ympev.2024.108181. Epub 2024 Aug 30.

Abstract

Phylogenetic tree reconstruction with molecular data is important in many fields of life science research. The gold standard in this discipline is the phylogenetic tree reconstruction based on the Maximum Likelihood method. In this study, we present neural networks to predict the best model of sequence evolution and the correct topology for four sequence alignments of nucleotide or amino acid sequence data. We trained neural networks with different architectures using simulated alignments for a wide range of evolutionary models, model parameters and branch lengths. By comparing the accuracy of model and topology prediction of the trained neural networks with Maximum Likelihood and Neighbour Joining methods, we show that for quartet trees, the neural network classifier outperforms the Neighbour Joining method and is in most cases as good as the Maximum Likelihood method to infer the best model of sequence evolution and the best tree topology. These results are consistent for nucleotide and amino acid sequence data. We also show that our method is superior for model selection than previously published methods based on convolutionary networks. Furthermore, we found that neural network classifiers are much faster than the IQ-TREE implementation of the Maximum Likelihood method. Our results show that neural networks could become a true competitor for the Maximum Likelihood method in phylogenetic reconstructions.

摘要

基于分子数据的系统发育树重建在生命科学研究的许多领域都很重要。该学科的金标准是基于最大似然法的系统发育树重建。在这项研究中,我们提出了神经网络来预测核苷酸或氨基酸序列数据的四个序列比对的最佳序列进化模型和正确拓扑。我们使用模拟比对,针对广泛的进化模型、模型参数和分支长度,用不同的架构训练了神经网络。通过比较训练后的神经网络在模型和拓扑预测方面的准确性与最大似然法和邻接法,我们表明对于四分树,神经网络分类器优于邻接法,在大多数情况下与最大似然法一样能够推断最佳序列进化模型和最佳树拓扑。对于核苷酸和氨基酸序列数据,这些结果都是一致的。我们还表明,与基于卷积网络的先前发表的方法相比,我们的方法在模型选择方面更具优势。此外,我们发现神经网络分类器比最大似然法的 IQ-TREE 实现快得多。我们的结果表明,神经网络可能成为系统发育重建中最大似然法的真正竞争者。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验