• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

揭示大数据集中隐藏的系统发育共识。

Uncovering hidden phylogenetic consensus in large data sets.

机构信息

Sandia National Laboratories, PO Box 5800, Albuquerque, NM 87185, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):902-11. doi: 10.1109/TCBB.2011.28.

DOI:10.1109/TCBB.2011.28
PMID:21301032
Abstract

Many of the steps in phylogenetic reconstruction can be confounded by “rogue” taxa—taxa that cannot be placed with assurance anywhere within the tree, indeed, whose location within the tree varies with almost any choice of algorithm or parameters. Phylogenetic consensus methods, in particular, are known to suffer from this problem. In this paper, we provide a novel framework to define and identify rogue taxa. In this framework, we formulate a bicriterion optimization problem, the relative information criterion, that models the net increase in useful information present in the consensus tree when certain taxa are removed from the input data. We also provide an effective greedy heuristic to identify a subset of rogue taxa and use this heuristic in a series of experiments, with both pathological examples from the literature and a collection of large biological data sets. As the presence of rogue taxa in a set of bootstrap replicates can lead to deceivingly poor support values, we propose a procedure to recompute support values in light of the rogue taxa identified by our algorithm; applying this procedure to our biological data sets caused a large number of edges to move from “unsupported” to “supported” status, indicating that many existing phylogenies should be recomputed and reevaluated to reduce any inaccuracies introduced by rogue taxa. We also discuss the implementation issues encountered while integrating our algorithm into RAxML v7.2.7, particularly those dealing with scaling up the analyses. This integration enables practitioners to benefit from our algorithm in the analysis of very large data sets (up to 2,500 taxa and 10,000 trees, although we present the results of even larger analyses).

摘要

系统发育重建的许多步骤都会受到“异常”分类单元的干扰,这些分类单元无法确定地放置在树中的任何位置,实际上,它们在树中的位置随着算法或参数的几乎任何选择而变化。系统发育共识方法尤其存在这个问题。在本文中,我们提供了一个定义和识别异常分类单元的新框架。在这个框架中,我们制定了一个双标准优化问题,相对信息标准,它模拟了当从输入数据中删除某些分类单元时,共识树中存在的有用信息的净增加。我们还提供了一种有效的贪婪启发式方法来识别一组异常分类单元,并在一系列实验中使用这种启发式方法,包括来自文献的病理示例和一组大型生物数据集。由于异常分类单元的存在会导致误导性的支持值较差,因此我们提出了一种根据我们的算法识别出的异常分类单元重新计算支持值的过程;将该过程应用于我们的生物数据集导致大量边缘从“不支持”变为“支持”状态,这表明许多现有的系统发育树应该重新计算和重新评估,以减少异常分类单元引入的任何不准确之处。我们还讨论了在将我们的算法集成到 RAxML v7.2.7 中时遇到的实现问题,特别是那些涉及扩展分析的问题。这种集成使从业者能够从我们的算法中受益于非常大数据集的分析(多达 2500 个分类单元和 10000 棵树,尽管我们呈现了更大分析的结果)。

相似文献

1
Uncovering hidden phylogenetic consensus in large data sets.揭示大数据集中隐藏的系统发育共识。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):902-11. doi: 10.1109/TCBB.2011.28.
2
Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.修剪有害分类单元可提高系统发育准确性:一种高效算法和网络服务。
Syst Biol. 2013 Jan 1;62(1):162-6. doi: 10.1093/sysbio/sys078. Epub 2012 Sep 6.
3
Testing the rogue taxa hypothesis for clustering instability.检验聚类不稳定的流氓分类群假说。
J Theor Biol. 2019 Jul 7;472:36-45. doi: 10.1016/j.jtbi.2019.04.002. Epub 2019 Apr 4.
4
Genetic algorithm for large-scale maximum parsimony phylogenetic analysis of proteins.用于蛋白质大规模最大简约系统发育分析的遗传算法。
Biochim Biophys Acta. 2005 Aug 30;1725(1):19-29. doi: 10.1016/j.bbagen.2005.04.027.
5
Polyhedral geometry of phylogenetic rogue taxa.系统发育离群分类单元的多面体几何。
Bull Math Biol. 2011 Jun;73(6):1202-26. doi: 10.1007/s11538-010-9556-x. Epub 2010 Jul 17.
6
COSPEDTree: COuplet Supertree by Equivalence Partitioning of Taxa Set and DAG Formation.COSPEDTree:通过分类单元集的等价划分和有向无环图形成的成对超级树
IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):590-603. doi: 10.1109/TCBB.2014.2366778.
7
How many bootstrap replicates are necessary?需要多少个自展重复样本?
J Comput Biol. 2010 Mar;17(3):337-54. doi: 10.1089/cmb.2009.0179.
8
Sparse supermatrices for phylogenetic inference: taxonomy, alignment, rogue taxa, and the phylogeny of living turtles.用于系统发育推断的稀疏超级矩阵:分类学、比对、异常分类单元和活海龟的系统发育。
Syst Biol. 2010 Jan;59(1):42-58. doi: 10.1093/sysbio/syp075. Epub 2009 Nov 11.
9
Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes.对2228种蝶形花科豆科植物的GenBank序列进行系统发育超矩阵分析。
Syst Biol. 2006 Oct;55(5):818-36. doi: 10.1080/10635150600999150.
10
Refining phylogenetic trees given additional data: an algorithm based on parsimony.给定额外数据时完善系统发育树:一种基于简约法的算法
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jan-Mar;6(1):118-25. doi: 10.1109/TCBB.2008.100.

引用本文的文献

1
The ropAe gene encodes a porin-like protein involved in copper transit in Rhizobium etli CFN42.ropAe 基因编码一种孔蛋白样蛋白,参与 Rhizobium etli CFN42 中的铜转运。
Microbiologyopen. 2018 Jun;7(3):e00573. doi: 10.1002/mbo3.573. Epub 2017 Dec 27.
2
HPV16 variants distribution in invasive cancers of the cervix, vulva, vagina, penis, and anus.人乳头瘤病毒16型变体在子宫颈、外阴、阴道、阴茎和肛门浸润性癌中的分布情况。
Cancer Med. 2016 Oct;5(10):2909-2919. doi: 10.1002/cam4.870. Epub 2016 Sep 21.
3
The Evolution of the Secreted Regulatory Protein Progranulin.
分泌调节蛋白颗粒体蛋白前体的进化
PLoS One. 2015 Aug 6;10(8):e0133749. doi: 10.1371/journal.pone.0133749. eCollection 2015.
4
Concatabominations: identifying unstable taxa in morphological phylogenetics using a heuristic extension to safe taxonomic reduction.混合畸形:使用启发式扩展安全分类学简化法在形态系统发育学中识别不稳定分类单元。
Syst Biol. 2015 Jan;64(1):137-43. doi: 10.1093/sysbio/syu066. Epub 2014 Sep 2.
5
Enumerating all maximal frequent subtrees in collections of phylogenetic trees.枚举系统发育树集合中的所有最大频繁子树。
Algorithms Mol Biol. 2014 Jun 18;9:16. doi: 10.1186/1748-7188-9-16. eCollection 2014.
6
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.RAxML 版本 8:用于系统发育分析和大型系统发育后分析的工具。
Bioinformatics. 2014 May 1;30(9):1312-3. doi: 10.1093/bioinformatics/btu033. Epub 2014 Jan 21.
7
A phylogenomic approach to vertebrate phylogeny supports a turtle-archosaur affinity and a possible paraphyletic lissamphibia.系统发生基因组学方法支持龟鳖类与主龙类具有亲缘关系,以及可能的并系有尾两栖动物。
PLoS One. 2012;7(11):e48990. doi: 10.1371/journal.pone.0048990. Epub 2012 Nov 7.
8
A scalable method for identifying frequent subtrees in sets of large phylogenetic trees.一种可扩展的方法,用于识别大型系统发育树集中的频繁子树。
BMC Bioinformatics. 2012 Oct 3;13:256. doi: 10.1186/1471-2105-13-256.
9
Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.修剪有害分类单元可提高系统发育准确性:一种高效算法和网络服务。
Syst Biol. 2013 Jan 1;62(1):162-6. doi: 10.1093/sysbio/sys078. Epub 2012 Sep 6.
10
The evolution of Dscam genes across the arthropods.昆虫 Dscam 基因的进化。
BMC Evol Biol. 2012 Apr 13;12:53. doi: 10.1186/1471-2148-12-53.