• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

有比对和无比对情况下的系统发育树估计:新的距离方法与基准测试

Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.

作者信息

Bogusz Marcin, Whelan Simon

机构信息

Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden.

出版信息

Syst Biol. 2017 Mar 1;66(2):218-231. doi: 10.1093/sysbio/syw074.

DOI:10.1093/sysbio/syw074
PMID:27633353
Abstract

Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error. [Alignment-free; distance-based phylogenetics; pair Hidden Markov Models; phylogenetic inference; statistical alignment.].

摘要

系统发育树推断是许多系统学和进化研究的关键组成部分。尽管有持续的证据表明比对步骤可能导致有偏差的结果,但这些研究中的大多数都基于多序列比对后进行树推断的两步过程。在这里,我们提出了一项分为两部分的研究,首先介绍了PaHMM-Tree,这是一种基于邻居连接的新方法,它在不假设单一比对的情况下估计成对距离。然后,我们使用模拟将其性能与其他多种系统发育树推断方法进行基准测试,包括首次将基于无比对距离的方法与更传统的树估计方法进行比较。我们基于统计比对计算成对距离的新方法提供的距离估计与使用基于真实比对的标准方法获得的估计一样准确。基于两步过程的成对距离估计往往准确性要低得多。这种改进的性能在树推断中也有所体现,PaHMM-Tree提供的树估计比所有评估的成对距离方法都更准确。对于接近中度分歧的序列数据,我们发现使用统计推断的两步方法(其中所有序列的信息都包含在估计过程中)往往比PaHMM-Tree表现更好,特别是完全统计比对,它同时估计树和比对。对于深度分歧,我们发现比对步骤变得非常容易出错,以至于我们基于距离的PaHMM-Tree优于所有其他树推断方法。最后,我们发现在存在比对不确定性的情况下,无比对方法的准确性往往比标准两步方法下降得更快,并且即使在存在大量比对错误的情况下,也没有发现无比对方法等于或比标准系统发育方法更准确的情况。[无比对;基于距离的系统发育学;成对隐马尔可夫模型;系统发育推断;统计比对。]

相似文献

1
Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.有比对和无比对情况下的系统发育树估计:新的距离方法与基准测试
Syst Biol. 2017 Mar 1;66(2):218-231. doi: 10.1093/sysbio/syw074.
2
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
3
Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.在存在多序列比对不确定性的情况下系统发育方法统计不一致性的证据。
Genome Biol Evol. 2015 Jul 1;7(8):2102-16. doi: 10.1093/gbe/evv127.
4
Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction.用于系统发育树重建的统计一致k-mer方法
J Comput Biol. 2017 Feb;24(2):153-171. doi: 10.1089/cmb.2015.0216. Epub 2016 Jul 7.
5
Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.当将间隙视为缺失数据时树木扭曲及树木估计的不一致性——模型错误设定对距离校正的影响
Mol Phylogenet Evol. 2015 Dec;93:289-95. doi: 10.1016/j.ympev.2015.07.027. Epub 2015 Aug 6.
6
Assessment of protein distance measures and tree-building methods for phylogenetic tree reconstruction.用于系统发育树重建的蛋白质距离度量和建树方法评估。
Mol Biol Evol. 2005 Nov;22(11):2257-64. doi: 10.1093/molbev/msi224. Epub 2005 Jul 27.
7
Phylogenetic inference with weighted codon evolutionary distances.基于加权密码子进化距离的系统发育推断。
J Mol Evol. 2009 Apr;68(4):377-92. doi: 10.1007/s00239-009-9212-y. Epub 2009 Mar 24.
8
On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。
Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.
9
A Poissonian Model of Indel Rate Variation for Phylogenetic Tree Inference.用于系统发育树推断的插入缺失率变异的泊松模型。
Syst Biol. 2017 Sep 1;66(5):698-714. doi: 10.1093/sysbio/syx033.
10
Class of multiple sequence alignment algorithm affects genomic analysis.多序列比对算法的类别会影响基因组分析。
Mol Biol Evol. 2013 Mar;30(3):642-53. doi: 10.1093/molbev/mss256. Epub 2012 Nov 9.

引用本文的文献

1
SYNY: a pipeline to investigate and visualize collinearity between genomes.SYNY:一个用于研究和可视化基因组间共线性的流程。
bioRxiv. 2024 May 13:2024.05.09.593317. doi: 10.1101/2024.05.09.593317.
2
Diversity of Arbuscular Mycorrhizal Fungi in Distinct Ecosystems of the North Caucasus, a Temperate Biodiversity Hotspot.北高加索地区(一个温带生物多样性热点地区)不同生态系统中丛枝菌根真菌的多样性
J Fungi (Basel). 2023 Dec 24;10(1):11. doi: 10.3390/jof10010011.
3
Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling.
使用子采样量化无组装全基因组距离估计和系统发育关系的不确定性。
Cell Syst. 2022 Oct 19;13(10):817-829.e3. doi: 10.1016/j.cels.2022.06.007.
4
Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model.在无链偏性模型下的全基因组无比对系统发育距离估计
Bioinform Adv. 2022 Aug 12;2(1):vbac055. doi: 10.1093/bioadv/vbac055. eCollection 2022.
5
Druggability for COVID-19: in silico discovery of potential drug compounds against nucleocapsid (N) protein of SARS-CoV-2.新型冠状病毒肺炎的药物可开发性:基于计算机模拟发现针对严重急性呼吸综合征冠状病毒2核衣壳(N)蛋白的潜在药物化合物
Genomics Inform. 2020 Dec;18(4):e43. doi: 10.5808/GI.2020.18.4.e43. Epub 2020 Dec 9.
6
Automated Removal of Non-homologous Sequence Stretches with PREQUAL.使用PREQUAL自动去除非同源序列片段。
Methods Mol Biol. 2021;2231:147-162. doi: 10.1007/978-1-0716-1036-7_10.
7
Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices.基于机器学习的填补技术,用于从不完全距离矩阵估计系统发育树。
BMC Genomics. 2020 Jul 20;21(1):497. doi: 10.1186/s12864-020-06892-5.
8
The Cumulative Indel Model: Fast and Accurate Statistical Evolutionary Alignment.累积插入缺失模型:快速准确的统计进化比对。
Syst Biol. 2021 Feb 10;70(2):236-257. doi: 10.1093/sysbio/syaa050.
9
An analysis of acquired antimicrobial resistance genes in plasmids.质粒中获得性抗菌耐药基因的分析
AIMS Microbiol. 2020 Mar 16;6(1):75-91. doi: 10.3934/microbiol.2020005. eCollection 2020.
10
Identifying Clusters of High Confidence Homologies in Multiple Sequence Alignments.鉴定多重序列比对中高置信同源簇。
Mol Biol Evol. 2019 Oct 1;36(10):2340-2351. doi: 10.1093/molbev/msz142.