• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

需要多少个自展重复样本?

How many bootstrap replicates are necessary?

作者信息

Pattengale Nicholas D, Alipour Masoud, Bininda-Emonds Olaf R P, Moret Bernard M E, Stamatakis Alexandros

机构信息

Department of Computer Science, University of New Mexico, Albuquerque, New Mexico 87123, USA.

出版信息

J Comput Biol. 2010 Mar;17(3):337-54. doi: 10.1089/cmb.2009.0179.

DOI:10.1089/cmb.2009.0179
PMID:20377449
Abstract

Phylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.

摘要

系统发育自展法(BS)是一种推断系统发育树置信值的标准技术,它基于从输入数据的微小变化中重建许多棵树,这些树称为重复树。BS可与所有系统发育重建方法一起使用,但我们在此重点关注最流行的方法之一——最大似然法(ML)。由于ML推断对计算要求极高,迄今为止,评估BS中使用的重复树数量对支持值相对准确性的影响成本过高。出于同样的原因,在实际研究中计算的BS重复树数量相当少(通常为100棵)。斯塔马塔基斯等人最近引入了一种BS算法,其速度比以前的技术快1到2个数量级,同时产生质量相当的支持值,使得进行实验研究成为可能。在本文中,我们提出了停止标准——即在运行时计算的阈值,以确定何时已生成足够的重复树——并且我们报告了第一项大规模实验研究,以评估重复树数量对支持值质量的影响,包括我们提出的标准的性能。我们在17个不同的真实世界DNA数据集(单基因和多基因数据集)上进行测试,这些数据集包含125 - 2554个分类单元。我们发现,我们的停止标准通常在100 - 500次重复后停止计算(尽管最保守的标准可能会持续数千次重复),同时产生的支持值与最佳ML树上的参考值的相关性超过99.5%。值得注意的是,我们还发现,对于大小相当的不同数据集,停止标准可能会推荐非常不同的重复树数量。因此,我们的结果有两方面:(i)它们首次对BS重复树数量对通过BS返回的支持值质量的影响进行了实验评估,(ii)它们验证了我们提出的停止标准。从业者将不再需要猜测,也不必担心支持值的质量;此外,对于大多数重复树数量在100 - 500范围内的情况,ML推断下的稳健BS对于大多数数据集在计算上变得可行。完整的测试套件可在http://lcbb.epfl.ch/BS.tar.bz2获取,带有我们停止标准的BS包含在RAxML v7.2.5的最新版本中,可在http://wwwkramer.in.tum.de/exelixis/software.html获取。

相似文献

1
How many bootstrap replicates are necessary?需要多少个自展重复样本?
J Comput Biol. 2010 Mar;17(3):337-54. doi: 10.1089/cmb.2009.0179.
2
A rapid bootstrap algorithm for the RAxML Web servers.一种用于RAxML网络服务器的快速自引导算法。
Syst Biol. 2008 Oct;57(5):758-71. doi: 10.1080/10635150802429642.
3
Uncovering hidden phylogenetic consensus in large data sets.揭示大数据集中隐藏的系统发育共识。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):902-11. doi: 10.1109/TCBB.2011.28.
4
An empirical test of the relationship between the bootstrap and likelihood ratio support in maximum likelihood phylogenetic analysis.最大似然系统发育分析中 Bootstrap 和似然比支持度之间关系的实证检验。
Cladistics. 2022 Jun;38(3):392-401. doi: 10.1111/cla.12496. Epub 2021 Dec 21.
5
Ultrafast approximation for phylogenetic bootstrap.快速近似的系统发育自举法。
Mol Biol Evol. 2013 May;30(5):1188-95. doi: 10.1093/molbev/mst024. Epub 2013 Feb 15.
6
Genetic algorithm for large-scale maximum parsimony phylogenetic analysis of proteins.用于蛋白质大规模最大简约系统发育分析的遗传算法。
Biochim Biophys Acta. 2005 Aug 30;1725(1):19-29. doi: 10.1016/j.bbagen.2005.04.027.
7
Maximum likelihood of evolutionary trees: hardness and approximation.进化树的最大似然性:难度与近似性
Bioinformatics. 2005 Jun;21 Suppl 1:i97-106. doi: 10.1093/bioinformatics/bti1027.
8
RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation.RAxML 和 FastTree:比较两种大规模最大似然系统发育估计方法。
PLoS One. 2011;6(11):e27731. doi: 10.1371/journal.pone.0027731. Epub 2011 Nov 21.
9
RAxML-Light: a tool for computing terabyte phylogenies.RAxML-Light:用于计算 TB 级系统发育树的工具。
Bioinformatics. 2012 Aug 1;28(15):2064-6. doi: 10.1093/bioinformatics/bts309. Epub 2012 May 24.
10
Maximum parsimony for tree mixtures.树状混合模型的最大简约法。
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jan-Mar;6(1):97-102. doi: 10.1109/TCBB.2008.75.

引用本文的文献

1
Genomic characterization of novel bat kobuviruses in Madagascar: Implications for viral evolution and zoonotic risk.马达加斯加新型蝙蝠杯状病毒的基因组特征:对病毒进化和人畜共患病风险的影响。
PLoS One. 2025 Sep 10;20(9):e0331736. doi: 10.1371/journal.pone.0331736. eCollection 2025.
2
Phenotypic, Chemotaxonomic, and Genome-Based Classification of Strains: Two Proposed Novel Species, sp. nov. and sp. nov.菌株的表型、化学分类学和基于基因组的分类:两个拟议的新物种,[物种名称1] 新种和[物种名称2] 新种
Biology (Basel). 2025 Aug 8;14(8):1024. doi: 10.3390/biology14081024.
3
Assessing the effect of temperature on metabolite production.
评估温度对代谢物产生的影响。
Microbiology (Reading). 2025 Aug;171(8). doi: 10.1099/mic.0.001598.
4
Hydrogen Oxidation Benefits Alphaproteobacterial Methanotrophs Under Severe Methane Limitation.在严重甲烷限制条件下,氢氧化作用对α-变形菌纲甲烷营养菌有益。
Environ Microbiol. 2025 Aug;27(8):e70163. doi: 10.1111/1462-2920.70163.
5
Diversity and seasonality of ectoparasite burden on two species of Madagascar fruit bat, Eidolon dupreanum and Rousettus madagascariensis.马达加斯加两种果蝠(杜氏锤头果蝠和马达加斯加果蝠)体表寄生虫负荷的多样性和季节性
Parasit Vectors. 2025 Jul 28;18(1):302. doi: 10.1186/s13071-025-06805-z.
6
Snow- and ice-ecosystem cleaning capability of the pucciniomycotinous yeast Phenoliferia psychrophenolica.嗜冷类酵母酚生嗜冷类酵母对冰雪生态系统的清洁能力
Commun Biol. 2025 Jul 21;8(1):1084. doi: 10.1038/s42003-025-08506-w.
7
Phylogenetic analysis of H5N1 influenza viruses isolated from dairy cattle in Texas in December 2024.对2024年12月在得克萨斯州从奶牛中分离出的H5N1流感病毒进行的系统发育分析。
J Virol. 2025 Aug 19;99(8):e0058025. doi: 10.1128/jvi.00580-25. Epub 2025 Jul 8.
8
sp. nov., a novel endophytic actinobacterium isolated from the root nodules of .新种,一种从……根瘤中分离出的新型内生放线菌。
Int J Syst Evol Microbiol. 2025 Jun;75(6). doi: 10.1099/ijsem.0.006810.
9
Taxonomic description of sp. nov. and its biosynthetic and plant growth-promoting potential.新物种的分类描述及其生物合成和促进植物生长的潜力。
Microbiol Spectr. 2025 Apr;13(4):e0212924. doi: 10.1128/spectrum.02129-24. Epub 2025 Mar 3.
10
Diversity and seasonality of ectoparasite burden on two species of Madagascar fruit bat, and .马达加斯加两种果蝠( 和 )体表寄生虫负担的多样性和季节性
bioRxiv. 2025 Jan 22:2025.01.20.633693. doi: 10.1101/2025.01.20.633693.