Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna BioCenter (VBC) 5, 1030 Vienna, Austria.
School of Mathematical Sciences, University of Adelaide, Adelaide, SA 5005, Australia; ARC Centre of Excellence for Mathematical and Statistical Frontiers, University of Adelaide, Adelaide, SA 5005, Australia.
Mol Phylogenet Evol. 2023 Nov;188:107905. doi: 10.1016/j.ympev.2023.107905. Epub 2023 Aug 16.
Selecting the best model of sequence evolution for a multiple-sequence-alignment (MSA) constitutes the first step of phylogenetic tree reconstruction. Common approaches for inferring nucleotide models typically apply maximum likelihood (ML) methods, with discrimination between models determined by one of several information criteria. This requires tree reconstruction and optimisation which can be computationally expensive. We demonstrate that neural networks can be used to perform model selection, without the need to reconstruct trees, optimise parameters, or calculate likelihoods. We introduce ModelRevelator, a model selection tool underpinned by two deep neural networks. The first neural network, NNmodelfind, recommends one of six commonly used models of sequence evolution, ranging in complexity from Jukes and Cantor to General Time Reversible. The second, NNalphafind, recommends whether or not a Γ-distributed rate heterogeneous model should be incorporated, and if so, provides an estimate of the shape parameter, ɑ. Users can simply input an MSA into ModelRevelator, and swiftly receive output recommending the evolutionary model, inclusive of the presence or absence of rate heterogeneity, and an estimate of ɑ. We show that ModelRevelator performs comparably with likelihood-based methods and the recently published machine learning method ModelTeller over a wide range of parameter settings, with significant potential savings in computational effort. Further, we show that this performance is not restricted to the alignments on which the networks were trained, but is maintained even on unseen empirical data. We expect that ModelRevelator will provide a valuable alternative for phylogeneticists, especially where traditional methods of model selection are computationally prohibitive.
选择最佳的多序列比对 (MSA) 序列进化模型是构建系统发育树的第一步。推断核苷酸模型的常用方法通常采用最大似然 (ML) 方法,通过几种信息标准之一来区分模型。这需要进行树重建和优化,这可能需要大量的计算资源。我们证明可以使用神经网络来进行模型选择,而无需重建树、优化参数或计算似然。我们引入了 ModelRevelator,这是一种基于两个深度神经网络的模型选择工具。第一个神经网络 NNmodelfind 推荐六种常用的序列进化模型之一,从简单的 Jukes 和 Cantor 模型到通用时间可逆模型 (General Time Reversible)。第二个神经网络 NNalphafind 推荐是否应该纳入 Γ 分布的速率异质性模型,如果是,它会提供形状参数 ɑ 的估计值。用户只需将 MSA 输入 ModelRevelator,即可快速获得推荐的进化模型,包括是否存在速率异质性,以及 ɑ 的估计值。我们表明,ModelRevelator 在广泛的参数设置下与基于似然的方法和最近发布的机器学习方法 ModelTeller 表现相当,在计算工作量方面有显著的节省潜力。此外,我们还表明,这种性能不仅限于网络训练的比对,即使在未见的经验数据上也能保持。我们预计 ModelRevelator 将为系统发育学家提供一种有价值的替代方法,特别是在传统的模型选择方法计算成本过高的情况下。