Suppr超能文献

ModelTeller:使用机器学习进行最优系统发育重建的模型选择。

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.

机构信息

School of Plant Sciences and Food security, Tel-Aviv University, Tel-Aviv, Israel.

School of Molecular Cell Biology & Biotechnology, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

Mol Biol Evol. 2020 Nov 1;37(11):3338-3352. doi: 10.1093/molbev/msaa154.

Abstract

Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.

摘要

统计标准长期以来一直是选择用于系统发育重建和下游统计推断的最佳模型的标准。尽管模型选择被认为是系统发育学中的一个基本步骤,但现有的方法在执行此任务时需要消耗大量的计算资源,并且处理时间长,并非总是可行,有时还取决于不适合序列数据的初步假设。此外,尽管这些方法专门用于揭示序列数据背后的过程,但它们并不总是生成最准确的树。值得注意的是,系统发育重建由两个相关的任务组成,即拓扑重建和分支长度估计。以前的研究表明,在许多情况下,最复杂的模型 GTR+I+G 生成的拓扑与使用现有模型选择标准一样准确,但会高估分支长度。在这里,我们提出了 ModelTeller,这是一种在机器学习框架内设计的系统发育模型选择计算方法,经过优化,可以预测最准确的核苷酸取代模型,用于分支长度估计。我们证明,在根据真实过程模拟的数据集上,ModelTeller 导致的分支长度推断比当前的模型选择标准更准确。ModelTeller 依赖于易于实现的机器学习模型,因此,与现有的策略相比,根据从序列数据中提取的特征进行预测会大大减少运行时间。通过利用机器学习框架,我们区分了对分支长度优化贡献最大的特征,这些特征与序列分歧程度有关,以及与当前标准选择有关的重要模型参数估计相关的特征。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验