• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索序列相似性与准确系统发育树之间的关系。

Exploring the relationship between sequence similarity and accurate phylogenetic trees.

作者信息

Cantarel Brandi L, Morrison Hilary G, Pearson William

机构信息

Department of Biochemistry and Molecular Genetics, University of Virginia, VA, USA.

出版信息

Mol Biol Evol. 2006 Nov;23(11):2090-100. doi: 10.1093/molbev/msl080. Epub 2006 Aug 4.

DOI:10.1093/molbev/msl080
PMID:16891377
Abstract

We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.

摘要

我们已经描述了准确的系统发育重建与序列相似性之间的关系,测试了高水平的序列相似性是否能始终产生准确的进化树。我们使用PAML/EVOLVER程序的修改版本生成了具有已知系统发育的蛋白质家族,该程序会产生插入、缺失以及替换。蛋白质家族在100 - 400个点接受突变的范围内进化;在这些距离下,63%的家族具有显著的序列相似性。蛋白质家族使用平衡和不平衡的树进行进化,有古老或近期的辐射。在具有统计学显著相似性的家族中,约60%的多序列比对与真实比对的相似度达到95%。为了将恢复的拓扑结构与真实拓扑结构进行比较,我们使用了一个反映正确聚类的分支比例的分数。正如预期的那样,在分歧最小的家族中,系统发育的准确性最高。使用贝叶斯、简约、距离和最大似然方法,在具有显著序列相似性的家族中,约88%的系统发育将超过80%的分支聚类。然而,对于具有短古老分支(古老辐射)的蛋白质家族,只有30%的分歧最大(但具有统计学显著性)的家族产生了准确的系统发育,对于第二高度保守的家族,中位数期望值优于10(-60)的,只有约70%产生了准确的树。这些值代表了具有简单分歧历史的序列的预期树准确性的上限;来自700个贾第虫家族的蛋白质,具有相似的序列相似性范围但间隙更多,产生的树准确性要低得多。对于我们模拟的插入和缺失,正确的多序列比对并不比T-COFFEE产生的比对好多少,并且包含具有表达序列标签样测序错误的序列并没有显著降低系统发育准确性。一般来说,虽然分歧较小的序列家族产生的树更准确,但估计准确树的可能性最取决于家族中的辐射是古老的还是近期的。在创建物种树时通过组合来自同一生物体的基因,或者在综合研究中通过选择具有最佳自展值的蛋白质家族,可以提高准确性。

相似文献

1
Exploring the relationship between sequence similarity and accurate phylogenetic trees.探索序列相似性与准确系统发育树之间的关系。
Mol Biol Evol. 2006 Nov;23(11):2090-100. doi: 10.1093/molbev/msl080. Epub 2006 Aug 4.
2
Evaluating the relationship between evolutionary divergence and phylogenetic accuracy in AFLP data sets.评估 AFLP 数据集内进化分歧与系统发育准确性之间的关系。
Mol Biol Evol. 2010 May;27(5):988-1000. doi: 10.1093/molbev/msp315. Epub 2009 Dec 21.
3
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
4
Scoredist: a simple and robust protein sequence distance estimator.Scoredist:一种简单且强大的蛋白质序列距离估计器。
BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108.
5
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.
6
GASP: Gapped Ancestral Sequence Prediction for proteins.GASP:蛋白质的间隔祖先序列预测
BMC Bioinformatics. 2004 Sep 6;5:123. doi: 10.1186/1471-2105-5-123.
7
Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees.序列比对和系统发育树的快速准确大规模联合估计
Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.
8
Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree.从多个基因推断物种系统发育:串联序列树与一致基因树。
J Exp Zool B Mol Dev Evol. 2005 Jan 15;304(1):64-74. doi: 10.1002/jez.b.21026.
9
Phylogenetic inference from conserved sites alignments.基于保守位点比对的系统发育推断。
J Exp Zool. 1999 Aug 15;285(2):128-39.
10
Ancestral sequence alignment under optimal conditions.在最佳条件下进行祖先序列比对。
BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.

引用本文的文献

1
Genome-wide and transcriptome analysis of PdWRKY transcription factors in date palm (Phoenix dactylifera) revealing insights into heat and drought stress tolerance.海枣(Phoenix dactylifera)中PdWRKY转录因子的全基因组和转录组分析揭示了对耐热和耐旱性的见解。
BMC Genomics. 2025 Jul 1;26(1):589. doi: 10.1186/s12864-025-11715-6.
2
Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection.推进微生物诊断学:一种通用的系统发育指导计算算法,用于寻找用于精确微生物检测的独特序列。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae545.
3
Dynamic interplay of , , and transcription factor families in tomato-endophytic fungal symbiosis: insights from transcriptome and genome-wide analysis.
番茄与内生真菌共生过程中, 、 和 转录因子家族的动态相互作用:转录组和全基因组分析的见解
Front Plant Sci. 2023 Jun 5;14:1181227. doi: 10.3389/fpls.2023.1181227. eCollection 2023.
4
Prevalence and Genotype of among Men in Xinxiang City, Henan Province, China.中国河南省新乡市男性中的患病率及基因型
J Trop Med. 2023 Feb 28;2023:4119956. doi: 10.1155/2023/4119956. eCollection 2023.
5
Frequent lineage-specific substitution rate changes support an episodic model for protein evolution.频繁的谱系特异性替换率变化支持蛋白质进化的阶段性模型。
G3 (Bethesda). 2021 Dec 8;11(12). doi: 10.1093/g3journal/jkab333.
6
Evolution of Toll, Spatzle and MyD88 in insects: the problem of the Diptera bias.昆虫中 Toll、Spatzle 和 MyD88 的进化:双翅目偏倚的问题。
BMC Genomics. 2021 Jul 21;22(1):562. doi: 10.1186/s12864-021-07886-7.
7
Revisiting Evaluation of Multiple Sequence Alignment Methods.重新审视多序列比对方法的评估
Methods Mol Biol. 2021;2231:299-317. doi: 10.1007/978-1-0716-1036-7_17.
8
Functional Evolution of Proteins.蛋白质的功能进化。
Proteins. 2019 Jun;87(6):492-501. doi: 10.1002/prot.25670. Epub 2019 Feb 19.
9
Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences.Phylo-PFP:利用远缘序列的系统发育距离改进自动化蛋白质功能预测。
Bioinformatics. 2019 Mar 1;35(5):753-759. doi: 10.1093/bioinformatics/bty704.
10
High-Resolution Identification of Specificity Determining Positions in the LacI Protein Family Using Ensembles of Sub-Sampled Alignments.使用子采样比对集合对LacI蛋白家族中特异性决定位点进行高分辨率鉴定。
PLoS One. 2016 Sep 28;11(9):e0162579. doi: 10.1371/journal.pone.0162579. eCollection 2016.