• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ModelTeller:使用机器学习进行最优系统发育重建的模型选择。

ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.

机构信息

School of Plant Sciences and Food security, Tel-Aviv University, Tel-Aviv, Israel.

School of Molecular Cell Biology & Biotechnology, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

Mol Biol Evol. 2020 Nov 1;37(11):3338-3352. doi: 10.1093/molbev/msaa154.

DOI:10.1093/molbev/msaa154
PMID:32585030
Abstract

Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. Although model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, although these methods are dedicated to revealing the processes that underlie the sequence data, they do not always produce the most accurate trees. Notably, phylogeny reconstruction consists of two related tasks, topology reconstruction and branch-length estimation. It was previously shown that in many cases the most complex model, GTR+I+G, leads to topologies that are as accurate as using existing model selection criteria, but overestimates branch lengths. Here, we present ModelTeller, a computational methodology for phylogenetic model selection, devised within the machine-learning framework, optimized to predict the most accurate nucleotide substitution model for branch-length estimation. We demonstrate that ModelTeller leads to more accurate branch-length inference than current model selection criteria on data sets simulated under realistic processes. ModelTeller relies on a readily implemented machine-learning model and thus the prediction according to features extracted from the sequence data results in a substantial decrease in running time compared with existing strategies. By harnessing the machine-learning framework, we distinguish between features that mostly contribute to branch-length optimization, concerning the extent of sequence divergence, and features that are related to estimates of the model parameters that are important for the selection made by current criteria.

摘要

统计标准长期以来一直是选择用于系统发育重建和下游统计推断的最佳模型的标准。尽管模型选择被认为是系统发育学中的一个基本步骤,但现有的方法在执行此任务时需要消耗大量的计算资源,并且处理时间长,并非总是可行,有时还取决于不适合序列数据的初步假设。此外,尽管这些方法专门用于揭示序列数据背后的过程,但它们并不总是生成最准确的树。值得注意的是,系统发育重建由两个相关的任务组成,即拓扑重建和分支长度估计。以前的研究表明,在许多情况下,最复杂的模型 GTR+I+G 生成的拓扑与使用现有模型选择标准一样准确,但会高估分支长度。在这里,我们提出了 ModelTeller,这是一种在机器学习框架内设计的系统发育模型选择计算方法,经过优化,可以预测最准确的核苷酸取代模型,用于分支长度估计。我们证明,在根据真实过程模拟的数据集上,ModelTeller 导致的分支长度推断比当前的模型选择标准更准确。ModelTeller 依赖于易于实现的机器学习模型,因此,与现有的策略相比,根据从序列数据中提取的特征进行预测会大大减少运行时间。通过利用机器学习框架,我们区分了对分支长度优化贡献最大的特征,这些特征与序列分歧程度有关,以及与当前标准选择有关的重要模型参数估计相关的特征。

相似文献

1
ModelTeller: Model Selection for Optimal Phylogenetic Reconstruction Using Machine Learning.ModelTeller:使用机器学习进行最优系统发育重建的模型选择。
Mol Biol Evol. 2020 Nov 1;37(11):3338-3352. doi: 10.1093/molbev/msaa154.
2
Reliable estimation of tree branch lengths using deep neural networks.利用深度神经网络可靠估计树枝长度。
PLoS Comput Biol. 2024 Aug 5;20(8):e1012337. doi: 10.1371/journal.pcbi.1012337. eCollection 2024 Aug.
3
Machine learning can be as good as maximum likelihood when reconstructing phylogenetic trees and determining the best evolutionary model on four taxon alignments.在重建系统发育树和确定四个分类群对齐时的最佳进化模型方面,机器学习可以与最大似然法一样好。
Mol Phylogenet Evol. 2024 Nov;200:108181. doi: 10.1016/j.ympev.2024.108181. Epub 2024 Aug 30.
4
Applications of machine learning in phylogenetics.机器学习在系统发生学中的应用。
Mol Phylogenet Evol. 2024 Jul;196:108066. doi: 10.1016/j.ympev.2024.108066. Epub 2024 Mar 31.
5
On the inference of large phylogenies with long branches: how long is too long?具有长分支的大系统发育推断:多长算太长?
Bull Math Biol. 2011 Jul;73(7):1627-44. doi: 10.1007/s11538-010-9584-6. Epub 2010 Oct 8.
6
Influence of substitution model selection on protein phylogenetic tree reconstruction.替代模型选择对蛋白质系统发育树重建的影响。
Gene. 2023 May 20;865:147336. doi: 10.1016/j.gene.2023.147336. Epub 2023 Mar 3.
7
Nucleotide Substitution Model Selection Is Not Necessary for Bayesian Inference of Phylogeny With Well-Behaved Priors.对于具有良好先验的系统发育贝叶斯推断,核苷酸替换模型选择并非必要。
Syst Biol. 2023 Dec 30;72(6):1418-1432. doi: 10.1093/sysbio/syad041.
8
Linking Branch Lengths across Sets of Loci Provides the Highest Statistical Support for Phylogenetic Inference.将分支长度关联到一组组基因座上,为系统发育推断提供了最高的统计支持。
Mol Biol Evol. 2020 Apr 1;37(4):1202-1210. doi: 10.1093/molbev/msz291.
9
ModelRevelator: Fast phylogenetic model estimation via deep learning.ModelRevelator:通过深度学习实现快速的系统发育模型估计。
Mol Phylogenet Evol. 2023 Nov;188:107905. doi: 10.1016/j.ympev.2023.107905. Epub 2023 Aug 16.
10
Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies.深度残差神经网络解决四重分子系统发育问题。
Mol Biol Evol. 2020 May 1;37(5):1495-1507. doi: 10.1093/molbev/msz307.

引用本文的文献

1
Opportunities and Challenges in Applying AI to Evolutionary Morphology.将人工智能应用于进化形态学的机遇与挑战。
Integr Org Biol. 2024 Sep 23;6(1):obae036. doi: 10.1093/iob/obae036. eCollection 2024.
2
Phylogenetic reconciliation: making the most of genomes to understand microbial ecology and evolution.系统发育和解:充分利用基因组来理解微生物生态学与进化。
ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae129.
3
A machine-learning-based alternative to phylogenetic bootstrap.基于机器学习的替代系统,用于替代系统发育 bootstrap 分析。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i208-i217. doi: 10.1093/bioinformatics/btae255.
4
Toward a Semi-Supervised Learning Approach to Phylogenetic Estimation.迈向基于半监督学习的系统发育估计方法。
Syst Biol. 2024 Oct 30;73(5):789-806. doi: 10.1093/sysbio/syae029.
5
Forty Years of Inferential Methods in the Journals of the Society for Molecular Biology and Evolution.《分子生物学与进化学会期刊》中的推理方法四十年。
Mol Biol Evol. 2024 Jan 3;41(1). doi: 10.1093/molbev/msad264.
6
OrthoMaM v12: a database of curated single-copy ortholog alignments and trees to study mammalian evolutionary genomics.OrthoMaM v12:一个经过精心整理的单拷贝直系同源物比对和树数据库,用于研究哺乳动物进化基因组学。
Nucleic Acids Res. 2024 Jan 5;52(D1):D529-D535. doi: 10.1093/nar/gkad834.
7
AliSim-HPC: parallel sequence simulator for phylogenetics.AliSim-HPC:用于系统发生学的并行序列模拟器。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad540.
8
Advances in the Applications of Bioinformatics and Chemoinformatics.生物信息学与化学信息学的应用进展
Pharmaceuticals (Basel). 2023 Jul 24;16(7):1050. doi: 10.3390/ph16071050.
9
Universal mtDNA fragment for Cervidae barcoding species identification using phylogeny and preliminary analysis of machine learning approach.利用系统发育和机器学习方法的初步分析,为鹿科条形码物种鉴定建立通用的 mtDNA 片段。
Sci Rep. 2023 Jun 5;13(1):9133. doi: 10.1038/s41598-023-35637-z.
10
Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling.通过位点抽样和上采样来驯服系统发育基因组学中最优替代模型的选择。
Mol Biol Evol. 2022 Nov 3;39(11). doi: 10.1093/molbev/msac236.