• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对2228种蝶形花科豆科植物的GenBank序列进行系统发育超矩阵分析。

Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes.

作者信息

McMahon Michelle M, Sanderson Michael J

机构信息

Section of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA.

出版信息

Syst Biol. 2006 Oct;55(5):818-36. doi: 10.1080/10635150600999150.

DOI:10.1080/10635150600999150
PMID:17060202
Abstract

A comprehensive phylogeny of papilionoid legumes was inferred from sequences of 2228 taxa in GenBank release 147. A semiautomated analysis pipeline was constructed to download, parse, assemble, align, combine, and build trees from a pool of 11,881 sequences. Initial steps included all-against-all BLAST similarity searches coupled with assembly, using a novel strategy for building length-homogeneous primary sequence clusters. This was followed by a combination of global and local alignment protocols to build larger secondary clusters of locally aligned sequences, thus taking into account the dramatic differences in length of the heterogeneous coding and noncoding sequence data present in GenBank. Next, clusters were checked for the presence of duplicate genes and other potentially misleading sequences and examined for combinability with other clusters on the basis of taxon overlap. Finally, two supermatrices were constructed: a "sparse" matrix based on the primary clusters alone (1794 taxa x 53,977 characters), and a somewhat more "dense" matrix based on the secondary clusters (2228 taxa x 33,168 characters). Both matrices were very sparse, with 95% of their cells containing gaps or question marks. These were subjected to extensive heuristic parsimony analyses using deterministic and stochastic heuristics, including bootstrap analyses. A "reduced consensus" bootstrap analysis was also performed to detect cryptic signal in a subtree of the data set corresponding to a "backbone" phylogeny proposed in previous studies. Overall, the dense supermatrix appeared to provide much more satisfying results, indicated by better resolution of the bootstrap tree, excellent agreement with the backbone papilionoid tree in the reduced bootstrap consensus analysis, few problematic large polytomies in the strict consensus, and less fragmentation of conventionally recognized genera. Nevertheless, at lower taxonomic levels several problems were identified and diagnosed. A large number of methodological issues in supermatrix construction at this scale are discussed, including detection of annotation errors in GenBank sequences; the shortage of effective algorithms and software for local multiple sequence alignment; the difficulty of overcoming effects of fragmentation of data into nearly disjoint blocks in sparse supermatrices; and the lack of informative tools to assess confidence limits in very large trees.

摘要

基于GenBank第147版中2228个分类单元的序列推断出了蝶形花科豆科植物的综合系统发育。构建了一个半自动分析流程,用于从11881个序列库中下载、解析、组装、比对、合并并构建树。初始步骤包括全对全的BLAST相似性搜索及组装,采用一种构建长度均匀的一级序列簇的新策略。随后结合全局和局部比对协议,构建局部比对序列的更大二级簇,从而考虑到GenBank中存在的异质编码和非编码序列数据在长度上的巨大差异。接下来,检查簇中是否存在重复基因和其他潜在误导性序列,并根据分类单元重叠情况检查其与其他簇的可组合性。最后,构建了两个超级矩阵:一个仅基于一级簇的“稀疏”矩阵(1794个分类单元×53977个字符),以及一个基于二级簇的稍“密集”矩阵(2228个分类单元×33168个字符)。两个矩阵都非常稀疏,其95%的单元格包含空位或问号。对这些矩阵进行了广泛的启发式简约分析,使用确定性和随机性启发式方法,包括自展分析。还进行了“简化共识”自展分析,以检测数据集中对应于先前研究中提出的“主干”系统发育的子树中的隐藏信号。总体而言,密集超级矩阵似乎提供了更令人满意的结果,自展树的分辨率更高、在简化自展共识分析中与主干蝶形花科树高度一致、严格共识中几乎没有问题较大的多歧分支,以及传统认可属的碎片化程度更低。然而,在较低分类水平上发现并诊断出了几个问题。讨论了在如此规模的超级矩阵构建中的大量方法学问题,包括检测GenBank序列中的注释错误;缺乏用于局部多序列比对的有效算法和软件;克服稀疏超级矩阵中数据碎片化到几乎不相交块的影响的困难;以及缺乏用于评估非常大的树中置信限的信息工具。

相似文献

1
Phylogenetic supermatrix analysis of GenBank sequences from 2228 papilionoid legumes.对2228种蝶形花科豆科植物的GenBank序列进行系统发育超矩阵分析。
Syst Biol. 2006 Oct;55(5):818-36. doi: 10.1080/10635150600999150.
2
Sparse supermatrices for phylogenetic inference: taxonomy, alignment, rogue taxa, and the phylogeny of living turtles.用于系统发育推断的稀疏超级矩阵:分类学、比对、异常分类单元和活海龟的系统发育。
Syst Biol. 2010 Jan;59(1):42-58. doi: 10.1093/sysbio/syp075. Epub 2009 Nov 11.
3
The PhyLoTA Browser: processing GenBank for molecular phylogenetics research.系统发育树在线工具包浏览器:为分子系统发育研究处理基因库。
Syst Biol. 2008 Jun;57(3):335-46. doi: 10.1080/10635150802158688.
4
PhyloGena--a user-friendly system for automated phylogenetic annotation of unknown sequences.PhyloGena——一个用于对未知序列进行自动系统发育注释的用户友好型系统。
Bioinformatics. 2007 Apr 1;23(7):793-801. doi: 10.1093/bioinformatics/btm016. Epub 2007 Mar 1.
5
Resolving ambiguity of species limits and concatenation in multilocus sequence data for the construction of phylogenetic supermatrices.解决多基因序列数据中种系限制和连锁的歧义,构建系统发育超矩阵。
Syst Biol. 2013 May 1;62(3):456-66. doi: 10.1093/sysbio/syt011. Epub 2013 Feb 15.
6
Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data.用于从序列数据评估最大似然法和贝叶斯系统发育树稳定性的超快速算法。
Genome Inform. 2002;13:82-92.
7
A phylogenomic analysis of the Ascomycota.子囊菌门的系统基因组学分析。
Fungal Genet Biol. 2006 Oct;43(10):715-25. doi: 10.1016/j.fgb.2006.05.001. Epub 2006 Jun 15.
8
Prospects for building the tree of life from large sequence databases.从大型序列数据库构建生命之树的前景。
Science. 2004 Nov 12;306(5699):1172-4. doi: 10.1126/science.1102036.
9
Phylogenetic relationships in the Papilionoideae (family Leguminosae) based on nucleotide sequences of cpDNA (rbcL) and ncDNA (ITS 1 and 2).基于叶绿体DNA(rbcL)和核DNA(ITS 1和2)核苷酸序列的蝶形花亚科(豆科)系统发育关系。
Mol Phylogenet Evol. 1997 Aug;8(1):65-88. doi: 10.1006/mpev.1997.0410.
10
Vector representations and related matrices of DNA primary sequence based on L-tuple.基于 L 元组的 DNA 一级序列的向量表示及相关矩阵。
Math Biosci. 2010 Oct;227(2):147-52. doi: 10.1016/j.mbs.2010.07.004. Epub 2010 Aug 3.

引用本文的文献

1
Chloroplast genomic insights into adaptive evolution and rapid radiation in the genus Passiflora (Passifloraceae).西番莲属(西番莲科)适应性进化与快速辐射的叶绿体基因组见解
BMC Plant Biol. 2025 Feb 13;25(1):192. doi: 10.1186/s12870-025-06210-9.
2
Redefining Possible: Combining Phylogenomic and Supersparse Data in Frogs.重新定义可能:结合系统基因组学和超级稀疏数据研究蛙类。
Mol Biol Evol. 2023 May 2;40(5). doi: 10.1093/molbev/msad109.
3
Highly Resolved Papilionoid Legume Phylogeny Based on Plastid Phylogenomics.基于质体系统基因组学的高分辨率蝶形花亚科豆科植物系统发育研究
Front Plant Sci. 2022 Feb 23;13:823190. doi: 10.3389/fpls.2022.823190. eCollection 2022.
4
A synthesis tree of the Copepoda: integrating phylogenetic and taxonomic data reveals multiple origins of parasitism.桡足纲的综合树:整合系统发育和分类数据揭示了寄生现象的多个起源。
PeerJ. 2021 Aug 18;9:e12034. doi: 10.7717/peerj.12034. eCollection 2021.
5
mtDNAcombine: tools to combine sequences from multiple studies.mtDNAcombine:用于合并来自多个研究的序列的工具。
BMC Bioinformatics. 2021 Mar 9;22(1):115. doi: 10.1186/s12859-021-04048-0.
6
Total Ortholog Median Matrix as an alternative unsupervised approach for phylogenomics based on evolutionary distance between protein coding genes.基于蛋白质编码基因之间进化距离的全直系同源中位数矩阵作为一种替代无监督的系统发生基因组学方法。
Sci Rep. 2021 Feb 15;11(1):3791. doi: 10.1038/s41598-021-81926-w.
7
Caught in the Act: Variation in plastid genome inverted repeat expansion within and between populations of .当场捕获:[具体物种]种群内部和种群之间质体基因组反向重复序列扩增的变异 。 你提供的原文中“of.”后面缺少具体内容,我按照完整的翻译思路进行了补充翻译,你可根据实际情况修改。
Ecol Evol. 2020 Sep 29;10(21):12129-12137. doi: 10.1002/ece3.6839. eCollection 2020 Nov.
8
Exploration of Plastid Phylogenomic Conflict Yields New Insights into the Deep Relationships of Leguminosae.探究质体系统发育基因组冲突为豆科植物的深层关系提供新见解。
Syst Biol. 2020 Jul 1;69(4):613-622. doi: 10.1093/sysbio/syaa013.
9
Towards a barnacle tree of life: integrating diverse phylogenetic efforts into a comprehensive hypothesis of thecostracan evolution.迈向藤壶生命树:将多样的系统发育研究整合为关于蔓足类动物进化的全面假说。
PeerJ. 2019 Aug 16;7:e7387. doi: 10.7717/peerj.7387. eCollection 2019.
10
Lost and Found: Return of the Inverted Repeat in the Legume Clade Defined by Its Absence.失落与发现:在以不存在为特征的豆科植物类群中重复序列的回归。
Genome Biol Evol. 2019 Apr 1;11(4):1321-1333. doi: 10.1093/gbe/evz076.