• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

当将间隙视为缺失数据时树木扭曲及树木估计的不一致性——模型错误设定对距离校正的影响

Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.

作者信息

McTavish Emily Jane, Steel Mike, Holder Mark T

机构信息

Heidelberg Institute for Theoretical Studies, Heidelberg, Germany; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA.

Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.

出版信息

Mol Phylogenet Evol. 2015 Dec;93:289-95. doi: 10.1016/j.ympev.2015.07.027. Epub 2015 Aug 6.

DOI:10.1016/j.ympev.2015.07.027
PMID:26256643
Abstract

Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the true distances. They showed that if the corrected distance is a concave function of the true distance, then inconsistency due to long branch attraction will occur. If these functions are convex, then two "long branch repulsion" trees will be preferred over the true tree - though these two incorrect trees are expected to be tied as the preferred true. Here we extend their results, and demonstrate the existence of a tree shape (which we refer to as a "twisted Farris-zone" tree) for which a single incorrect tree topology will be guaranteed to be preferred if the corrected distance function is convex. We also report that the standard practice of treating gaps in sequence alignments as missing data is sufficient to produce non-linear corrected distance functions if the substitution process is not independent of the insertion/deletion process. Taken together, these results imply inconsistent tree inference under mild conditions. For example, if some positions in a sequence are constrained to be free of substitutions and insertion/deletion events while the remaining sites evolve with independent substitutions and insertion/deletion events, then the distances obtained by treating gaps as missing data can support an incorrect tree topology even given an unlimited amount of data.

摘要

如果成对序列差异能够转换为一组与真实进化距离成比例的距离,那么系统发育树或基因树的统计一致性估计是可能的。Susko等人(2004年)报告了一些关于树估计中不一致形式的显著广泛结果,如果校正距离与真实距离不成比例,这些不一致形式就可能出现。他们表明,如果校正距离是真实距离的凹函数,那么由于长枝吸引将出现不一致。如果这些函数是凸函数,那么两棵“长枝排斥”树将比真实树更受青睐——尽管预计这两棵错误的树会作为首选真实树而不分上下。在这里,我们扩展了他们的结果,并证明了存在一种树形(我们称之为“扭曲的法里斯区域”树),如果校正距离函数是凸函数,那么单一的错误树拓扑结构将肯定更受青睐。我们还报告说,如果替换过程与插入/缺失过程不独立,那么将序列比对中的空位视为缺失数据的标准做法足以产生非线性校正距离函数。综合起来,这些结果意味着在温和条件下树推断会出现不一致。例如,如果序列中的某些位置被限制不发生替换和插入/缺失事件,而其余位点以独立的替换和插入/缺失事件进化,那么即使给定无限量的数据,将空位视为缺失数据所获得的距离也可能支持错误的树拓扑结构。

相似文献

1
Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.当将间隙视为缺失数据时树木扭曲及树木估计的不一致性——模型错误设定对距离校正的影响
Mol Phylogenet Evol. 2015 Dec;93:289-95. doi: 10.1016/j.ympev.2015.07.027. Epub 2015 Aug 6.
2
Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data.基于分子数据的系统发育树估计的准确性。II. 基因频率数据。
J Mol Evol. 1983;19(2):153-70. doi: 10.1007/BF02300753.
3
Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking.有比对和无比对情况下的系统发育树估计:新的距离方法与基准测试
Syst Biol. 2017 Mar 1;66(2):218-231. doi: 10.1093/sysbio/syw074.
4
Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction.用于系统发育树重建的统计一致k-mer方法
J Comput Biol. 2017 Feb;24(2):153-171. doi: 10.1089/cmb.2015.0216. Epub 2016 Jul 7.
5
On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled.当替代过程建模错误时邻接法、最小二乘法和最小进化估计的不一致性
Mol Biol Evol. 2004 Sep;21(9):1629-42. doi: 10.1093/molbev/msh159. Epub 2004 May 21.
6
Phylogenetic inference under varying proportions of indel-induced alignment gaps.在不同比例的插入缺失导致的比对空位情况下的系统发育推断。
BMC Evol Biol. 2009 Aug 23;9:211. doi: 10.1186/1471-2148-9-211.
7
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
8
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.
9
Accuracy of estimated phylogenetic trees from molecular data. I. Distantly related species.基于分子数据的系统发育树估计的准确性。I. 远缘物种
J Mol Evol. 1982;18(6):387-404. doi: 10.1007/BF01840887.
10
Distances that perfectly mislead.极具误导性的距离。
Syst Biol. 2004 Apr;53(2):327-32. doi: 10.1080/10635150490423809.

引用本文的文献

1
On the transformation of MinHash-based uncorrected distances into proper evolutionary distances for phylogenetic inference.基于 MinHash 的未校正距离向用于系统发育推断的恰当进化距离的转化。
F1000Res. 2020 Nov 10;9:1309. doi: 10.12688/f1000research.26930.1. eCollection 2020.
2
Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning.使用深度学习从多重序列比对中准确推断树拓扑结构。
Syst Biol. 2020 Mar 1;69(2):221-233. doi: 10.1093/sysbio/syz060.
3
A 250 plastome phylogeny of the grass family (Poaceae): topological support under different data partitions.
禾本科250个质体基因组系统发育研究:不同数据分区下的拓扑支持
PeerJ. 2018 Feb 2;6:e4299. doi: 10.7717/peerj.4299. eCollection 2018.
4
Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation.使用近似贝叶斯计算推断插入缺失的发生率和长度分布
Genome Biol Evol. 2017 May 1;9(5):1280-1294. doi: 10.1093/gbe/evx084.
5
Maximum Likelihood Phylogenetic Inference is Consistent on Multiple Sequence Alignments, with or without Gaps.最大似然系统发育推断在有间隙或无间隙的多序列比对上是一致的。
Syst Biol. 2016 Mar;65(2):328-33. doi: 10.1093/sysbio/syv089. Epub 2015 Nov 28.