• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

核苷酸替换模型的选择在拓扑结构上重要吗?

Does the choice of nucleotide substitution models matter topologically?

作者信息

Hoff Michael, Orf Stefan, Riehm Benedikt, Darriba Diego, Stamatakis Alexandros

机构信息

Karlsruhe Institute of Technology, Department of Informatics, Kaiserstraße 12, Karlsruhe, 76131, Germany.

The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, Heidelberg, 69118, Germany.

出版信息

BMC Bioinformatics. 2016 Mar 24;17:143. doi: 10.1186/s12859-016-0985-x.

DOI:10.1186/s12859-016-0985-x
PMID:27009141
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4806516/
Abstract

BACKGROUND

In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria. We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. We also assess, to which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Finally, we assess if the definition of the sample size (#sites versus #sites × #taxa) yields different models and, as a consequence, different tree topologies.

RESULTS

We find that, all three factors (by order of impact: nucleotide model selection, information criterion used, sample size definition) can yield topologically substantially different final tree topologies (topological difference exceeding 10 %) for approximately 5 % of the tree inferences conducted on the 39 empirical datasets used in our study.

CONCLUSIONS

We find that, using the best-fit nucleotide substitution model may change the final ML tree topology compared to an inference under a default GTR model. The effect is less pronounced when comparing distinct information criteria. Nonetheless, in some cases we did obtain substantial topological differences.

摘要

背景

在卡尔斯鲁厄理工学院计算机科学系的硕士水平编程实践中,我们开发并提供了一个开源代码,用于在常用的赤池信息准则、修正赤池信息准则和贝叶斯信息准则下,在最大似然(ML)设置中测试所有203种可能的核苷酸替换模型。我们探讨了模型选择在拓扑结构上是否重要的问题,也就是说,在最优模型而非标准的通用时间可逆模型下进行ML推断是否会产生不同的树拓扑结构。我们还评估了在三个标准准则(AIC、AICc、BIC)下选择的模型和推断的树之间的差异程度。最后,我们评估样本量的定义(#位点与#位点×#分类单元)是否会产生不同的模型,进而产生不同的树拓扑结构。

结果

我们发现,对于我们研究中使用的39个经验数据集上进行的大约5%的树推断,所有三个因素(按影响程度排序:核苷酸模型选择、使用的信息准则、样本量定义)都可能产生拓扑结构上有显著差异的最终树拓扑结构(拓扑差异超过10%)。

结论

我们发现,与在默认GTR模型下进行推断相比,使用最佳拟合核苷酸替换模型可能会改变最终的ML树拓扑结构。在比较不同的信息准则时,这种影响不太明显。尽管如此,在某些情况下我们确实获得了显著的拓扑差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/0594e63c9da9/12859_2016_985_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/0a33c854ffc4/12859_2016_985_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/fbd96efc628d/12859_2016_985_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/73b9d0f18282/12859_2016_985_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/bda681d1bbb1/12859_2016_985_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/cc18b352f523/12859_2016_985_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/6122a9e703d7/12859_2016_985_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/37c9006a97e3/12859_2016_985_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/a870df6aab13/12859_2016_985_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/1ec790b579c1/12859_2016_985_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/b407ebe50b2c/12859_2016_985_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/a870df6aab13/12859_2016_985_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/39166d8d8a7e/12859_2016_985_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/0594e63c9da9/12859_2016_985_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/0a33c854ffc4/12859_2016_985_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/fbd96efc628d/12859_2016_985_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/73b9d0f18282/12859_2016_985_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/bda681d1bbb1/12859_2016_985_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/cc18b352f523/12859_2016_985_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/6122a9e703d7/12859_2016_985_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/37c9006a97e3/12859_2016_985_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/a870df6aab13/12859_2016_985_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/1ec790b579c1/12859_2016_985_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/b407ebe50b2c/12859_2016_985_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/a870df6aab13/12859_2016_985_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/39166d8d8a7e/12859_2016_985_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7121/4806516/0594e63c9da9/12859_2016_985_Fig13_HTML.jpg

相似文献

1
Does the choice of nucleotide substitution models matter topologically?核苷酸替换模型的选择在拓扑结构上重要吗?
BMC Bioinformatics. 2016 Mar 24;17:143. doi: 10.1186/s12859-016-0985-x.
2
Does choice in model selection affect maximum likelihood analysis?模型选择中的选择会影响最大似然分析吗?
Syst Biol. 2008 Feb;57(1):76-85. doi: 10.1080/10635150801898920.
3
The devil in the details: interactions between the branch-length prior and likelihood model affect node support and branch lengths in the phylogeny of the Psoraceae.细节中的魔鬼:分支长度先验和似然模型之间的相互作用影响了 Psoraceae 系统发育中的节点支持和分支长度。
Syst Biol. 2011 Jul;60(4):541-61. doi: 10.1093/sysbio/syr022. Epub 2011 Mar 24.
4
On the Use of Information Criteria for Model Selection in Phylogenetics.关于信息准则在系统发育学模型选择中的应用。
Mol Biol Evol. 2020 Feb 1;37(2):549-562. doi: 10.1093/molbev/msz228.
5
The Limits of the Constant-rate Birth-Death Prior for Phylogenetic Tree Topology Inference.《系统发育树拓扑推断中恒定速率 Birth-Death 先验的局限性》。
Syst Biol. 2024 May 27;73(1):235-246. doi: 10.1093/sysbio/syad075.
6
Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo.使用可逆跳跃马尔可夫链蒙特卡罗方法进行贝叶斯系统发育模型选择。
Mol Biol Evol. 2004 Jun;21(6):1123-33. doi: 10.1093/molbev/msh123. Epub 2004 Mar 19.
7
The effect of branch length variation on the selection of models of molecular evolution.分支长度变异对分子进化模型选择的影响。
J Mol Evol. 2001 May;52(5):434-44. doi: 10.1007/s002390010173.
8
Selecting the best-fit model of nucleotide substitution.选择最佳拟合的核苷酸替换模型。
Syst Biol. 2001 Aug;50(4):580-601.
9
Assessment of substitution model adequacy using frequentist and Bayesian methods.使用频率论和贝叶斯方法评估替代模型的充分性。
Mol Biol Evol. 2010 Dec;27(12):2790-803. doi: 10.1093/molbev/msq168. Epub 2010 Jul 8.
10
Data-specific substitution models improve protein-based phylogenetics.基于数据的替代模型可提高基于蛋白质的系统发育分析。
PeerJ. 2023 Aug 8;11:e15716. doi: 10.7717/peerj.15716. eCollection 2023.

引用本文的文献

1
The impact of software and criteria on the selection of best-fit nucleotide substitution models for molecular evolutionary genetic analysis.软件和标准对分子进化遗传分析中最佳拟合核苷酸替换模型选择的影响。
PLoS One. 2025 Mar 26;20(3):e0319774. doi: 10.1371/journal.pone.0319774. eCollection 2025.
2
Genomic incongruence accompanies the evolution of flower symmetry in Eudicots: a case study in the poppy family (Papaveraceae, Ranunculales).基因组不一致现象伴随真双子叶植物花对称性的演化:以罂粟科(罂粟科,毛茛目)为例的研究。
Front Plant Sci. 2024 Jun 14;15:1340056. doi: 10.3389/fpls.2024.1340056. eCollection 2024.
3

本文引用的文献

1
The phylogenetic likelihood library.系统发育似然库。
Syst Biol. 2015 Mar;64(2):356-62. doi: 10.1093/sysbio/syu084. Epub 2014 Oct 30.
2
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.RAxML 版本 8:用于系统发育分析和大型系统发育后分析的工具。
Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.
3
jModelTest 2: more models, new heuristics and parallel computing.jModelTest 2:更多模型、新启发式方法与并行计算。
Fast-Evolving Alignment Sites Are Highly Informative for Reconstructions of Deep Tree of Life Phylogenies.
快速进化的比对位点对重建生命之树的深层系统发育具有高度信息价值。
Microorganisms. 2023 Oct 5;11(10):2499. doi: 10.3390/microorganisms11102499.
4
Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models.病毒基因组序列数据集显示出普遍存在的链特异性替代偏差证据,使用不可逆核苷酸替代模型能对其进行最佳描述。
Res Sq. 2022 Dec 29:rs.3.rs-2407778. doi: 10.21203/rs.3.rs-2407778/v1.
5
Taming the Selection of Optimal Substitution Models in Phylogenomics by Site Subsampling and Upsampling.通过位点抽样和上采样来驯服系统发育基因组学中最优替代模型的选择。
Mol Biol Evol. 2022 Nov 3;39(11). doi: 10.1093/molbev/msac236.
6
Comparative Genomic Analysis of Pseudoxanthomonas sp. X-1, a Bromoxynil Octanoate-Degrading Bacterium, and Its Related Type Strains.假单胞菌 X-1 的比较基因组分析,一种溴苯腈辛酸酯降解菌,及其相关的模式菌株。
Curr Microbiol. 2022 Jan 20;79(2):65. doi: 10.1007/s00284-021-02735-y.
7
Genomic Characterization of sp. nov., a Biofilm-Forming Fungus Isolated from Mars 2020 Assembly Facility.从火星2020组装设施分离出的一种形成生物膜的真菌——[具体菌种名称]的基因组特征分析 。 需注意,原文中“sp. nov.”部分应替换为具体的菌种名称,这里按要求保留原样进行了翻译表述。
J Fungi (Basel). 2022 Jan 9;8(1):66. doi: 10.3390/jof8010066.
8
The Diversity, Metabolomics Profiling, and the Pharmacological Potential of Actinomycetes Isolated from the Estremadura Spur Pockmarks (Portugal).从埃斯特雷马杜拉海脊结核(葡萄牙)中分离出的放线菌的多样性、代谢组学分析及药理学潜力。
Mar Drugs. 2021 Dec 23;20(1):21. doi: 10.3390/md20010021.
9
Felsenstein Phylogenetic Likelihood.费雪氏系统发生似然
J Mol Evol. 2021 Apr;89(3):134-145. doi: 10.1007/s00239-020-09982-w. Epub 2021 Jan 13.
10
Multiple Sequence Alignment for Large Heterogeneous Datasets Using SATé, PASTA, and UPP.使用SATé、PASTA和UPP对大型异构数据集进行多序列比对。
Methods Mol Biol. 2021;2231:99-119. doi: 10.1007/978-1-0716-1036-7_7.
Nat Methods. 2012 Jul 30;9(8):772. doi: 10.1038/nmeth.2109.
4
MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space.MrBayes 3.2:在大型模型空间中进行高效的贝叶斯系统发育推断和模型选择。
Syst Biol. 2012 May;61(3):539-42. doi: 10.1093/sysbio/sys029. Epub 2012 Feb 22.
5
New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.新算法和方法估计最大似然系统发育:评估 PhyML 3.0 的性能。
Syst Biol. 2010 May;59(3):307-21. doi: 10.1093/sysbio/syq010. Epub 2010 Mar 29.
6
A nuclear ribosomal DNA phylogeny of acer inferred with maximum likelihood, splits graphs, and motif analysis of 606 sequences.利用最大似然法、分裂图和 606 条序列的模体分析对 Acer 进行核核糖体 DNA 系统发育推断。
Evol Bioinform Online. 2007 Feb 17;2:7-22.
7
INDELible: a flexible simulator of biological sequence evolution.INDELible:一款灵活的生物序列进化模拟器。
Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.
8
Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics.贝叶斯系统发育学中马尔可夫链蒙特卡罗树提议的效率
Syst Biol. 2008 Feb;57(1):86-103. doi: 10.1080/10635150801886156.
9
Does choice in model selection affect maximum likelihood analysis?模型选择中的选择会影响最大似然分析吗?
Syst Biol. 2008 Feb;57(1):76-85. doi: 10.1080/10635150801898920.
10
Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests.系统发育学中的模型选择与模型平均:赤池信息准则和贝叶斯方法相对于似然比检验的优势
Syst Biol. 2004 Oct;53(5):793-808. doi: 10.1080/10635150490522304.