• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多物种溯祖模型下系统发育基因组学推断中的短枝吸引问题

Short branch attraction in phylogenomic inference under the multispecies coalescent.

作者信息

Liu Liang, Yu Lili, Wu Shaoyuan, Arnold Jonathan, Whalen Christopher, Davis Charles, Edwards Scott

机构信息

Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA, United States.

Department of Biostatistics, Georgia Southern University, Statesboro, GA, United States.

出版信息

Front Ecol Evol. 2023;11. doi: 10.3389/fevo.2023.1134764. Epub 2023 Jun 28.

DOI:10.3389/fevo.2023.1134764
PMID:39233780
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11372852/
Abstract

Accurate reconstruction of species trees often relies on the quality of input gene trees estimated from molecular sequences. Previous studies suggested that if the sequence length is fixed, the maximum likelihood may produce biased gene trees which subsequently mislead inference of species trees. Two key questions need to be answered in this context: what are the scenarios that may result in consistently biased gene trees? and for those scenarios, are there any remedies that may remove or at least reduce the misleading effects of consistently biased gene trees? In this article, we establish a theoretical framework to address these questions. Considering a scenario where the true gene tree is a 4-taxon star tree with two short branches leading to the species and , we demonstrate that maximum likelihood significantly favors the wrong bifurcating tree grouping the two species and with short branches. We name this inconsistent behavior short branch attraction, which may occur in real-world data involving a 4-taxon bifurcating gene tree with a short internal branch. If no mutation occurs along the internal branch, which is likely if the internal branch is short, the 4-taxon bifurcating tree is equivalent to the 4-taxon star tree and thus will suffer the same misleading effect of short branch attraction. Theoretical and simulation results further demonstrate that short branch attraction may occur in gene trees and species trees of arbitrary size. Moreover, short branch attraction is primarily caused by a lack of phylogenetic information in sequence data, suggesting that converting short internal branches to polytomies in the estimated gene trees can significantly reduce artifacts induced by short branch attraction.

摘要

物种树的准确重建通常依赖于从分子序列估计的输入基因树的质量。先前的研究表明,如果序列长度固定,最大似然法可能会产生有偏差的基因树,进而误导物种树的推断。在这种情况下,有两个关键问题需要回答:哪些情况可能导致基因树持续出现偏差?对于这些情况,是否有任何补救措施可以消除或至少减少持续有偏差的基因树的误导性影响?在本文中,我们建立了一个理论框架来解决这些问题。考虑到真实基因树是一个具有四条分类单元的星状树,有两条短分支通向物种 和 的情况,我们证明最大似然法显著倾向于错误的二叉树,即将两个具有短分支的物种 和 归为一组。我们将这种不一致的行为称为短分支吸引,它可能出现在涉及具有短内部分支的四条分类单元二叉基因树的实际数据中。如果沿着内部分支没有发生突变(如果内部分支很短,这种情况很可能发生),那么四条分类单元二叉树就等同于四条分类单元星状树,因此会受到相同的短分支吸引的误导性影响。理论和模拟结果进一步表明,短分支吸引可能出现在任意大小的基因树和物种树中。此外,短分支吸引主要是由序列数据中缺乏系统发育信息引起的,这表明在估计的基因树中将短内部分支转换为多歧分支可以显著减少由短分支吸引引起的假象。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/770af4c6da96/nihms-2011024-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/d6c271cade99/nihms-2011024-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/1fa3c7318877/nihms-2011024-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/1ccd16cc1bc1/nihms-2011024-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/770af4c6da96/nihms-2011024-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/d6c271cade99/nihms-2011024-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/1fa3c7318877/nihms-2011024-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/1ccd16cc1bc1/nihms-2011024-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a51d/11372852/770af4c6da96/nihms-2011024-f0004.jpg

相似文献

1
Short branch attraction in phylogenomic inference under the multispecies coalescent.多物种溯祖模型下系统发育基因组学推断中的短枝吸引问题
Front Ecol Evol. 2023;11. doi: 10.3389/fevo.2023.1134764. Epub 2023 Jun 28.
2
The gene tree delusion.基因树错觉
Mol Phylogenet Evol. 2016 Jan;94(Pt A):1-33. doi: 10.1016/j.ympev.2015.07.018. Epub 2015 Jul 31.
3
Theoretical and Practical Considerations when using Retroelement Insertions to Estimate Species Trees in the Anomaly Zone.在异常区域使用逆转录元件插入来估计物种树时的理论与实践考量
Syst Biol. 2022 Apr 19;71(3):721-740. doi: 10.1093/sysbio/syab086.
4
Collapsing dubiously resolved gene-tree branches in phylogenomic coalescent analyses.系统发生合并分析中可疑解决的基因树分支的崩溃。
Mol Phylogenet Evol. 2021 May;158:107092. doi: 10.1016/j.ympev.2021.107092. Epub 2021 Feb 2.
5
Quantification of congruence among gene trees with polytomies using overall success of resolution for phylogenomic coalescent analyses.使用系统发育合并分析中整体分辨率成功度对具多叉树的基因树进行同形度量化。
Cladistics. 2023 Oct;39(5):418-436. doi: 10.1111/cla.12540. Epub 2023 Apr 25.
6
Modern Phylogenomics: Building Phylogenetic Trees Using the Multispecies Coalescent Model.现代系统发育基因组学:使用多物种溯祖模型构建系统发育树
Methods Mol Biol. 2019;1910:211-239. doi: 10.1007/978-1-4939-9074-0_7.
7
A maximum pseudo-likelihood approach for estimating species trees under the coalescent model.最大拟似然法在合子模型下估计种系发生树。
BMC Evol Biol. 2010 Oct 11;10:302. doi: 10.1186/1471-2148-10-302.
8
Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation.基于相对分支长度差异和模型违背情况下蛋白质序列数据的贝叶斯和最大似然系统发育分析。
BMC Evol Biol. 2005 Jan 28;5:8. doi: 10.1186/1471-2148-5-8.
9
Discordance of species trees with their most likely gene trees: the case of five taxa.物种树与其最可能的基因树之间的不一致:五个分类单元的情况。
Syst Biol. 2008 Feb;57(1):131-40. doi: 10.1080/10635150801905535.
10
Gene Tree Estimation Error with Ultraconserved Elements: An Empirical Study on Pseudapis Bees.基于超保守元件的基因树估计误差:对拟蜜蜂属的实证研究。
Syst Biol. 2021 Jun 16;70(4):803-821. doi: 10.1093/sysbio/syaa097.

引用本文的文献

1
Internal Morphology and Phylogenetic Position of (Pancrustacea: Rhizocephala), an Enigmatic Parasitic Barnacle.神秘寄生藤壶“(泛甲壳动物:根头目)”的内部形态与系统发育位置
Biology (Basel). 2024 Nov 24;13(12):968. doi: 10.3390/biology13120968.

本文引用的文献

1
The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets.多物种合并模型在不同的系统基因组数据集上的表现优于串联。
Syst Biol. 2020 Jul 1;69(4):795-812. doi: 10.1093/sysbio/syaa008.
2
Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods.种系树估计中的长枝吸引:分区似然和基于拓扑的总结方法的不一致性。
Syst Biol. 2019 Mar 1;68(2):281-297. doi: 10.1093/sysbio/syy061.
3
Optimal Rates for Phylogenetic Inference and Experimental Design in the Era of Genome-Scale Data Sets.
基因组规模数据集时代的系统发育推断和实验设计的最佳速率。
Syst Biol. 2019 Jan 1;68(1):145-156. doi: 10.1093/sysbio/syy047.
4
Assessing the Impacts of Positive Selection on Coalescent-Based Species Tree Estimation and Species Delimitation.评估正选择对基于合并的种系发生树估计和种界定的影响。
Syst Biol. 2018 Nov 1;67(6):1076-1090. doi: 10.1093/sysbio/syy034.
5
The Impact of Missing Data on Species Tree Estimation.缺失数据对物种树估计的影响。
Mol Biol Evol. 2016 Mar;33(3):838-60. doi: 10.1093/molbev/msv266. Epub 2015 Nov 20.
6
Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased.当基因树估计存在偏差时,具有最少系统发育信息的基因对于溯祖分析来说是个难题。
Mol Phylogenet Evol. 2015 Nov;92:63-71. doi: 10.1016/j.ympev.2015.06.009. Epub 2015 Jun 24.
7
ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes.ASTRAL-II:基于合并的数百个分类群和数千个基因的种系发生树估计。
Bioinformatics. 2015 Jun 15;31(12):i44-52. doi: 10.1093/bioinformatics/btv234.
8
Utility of characters evolving at diverse rates of evolution to resolve quartet trees with unequal branch lengths: analytical predictions of long-branch effects.以不同进化速率演变的性状在解析具有不等分支长度的四重树时的效用:长分支效应的分析预测
BMC Evol Biol. 2015 May 14;15:86. doi: 10.1186/s12862-015-0364-7.
9
On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods.基于溯祖理论的物种树方法对基因树估计误差的稳健性(或缺乏稳健性)研究
Syst Biol. 2015 Jul;64(4):663-76. doi: 10.1093/sysbio/syv016. Epub 2015 Mar 25.
10
Bayesian long branch attraction bias and corrections.贝叶斯长枝吸引偏差与校正
Syst Biol. 2015 Mar;64(2):243-55. doi: 10.1093/sysbio/syu099. Epub 2014 Nov 27.