对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.

作者信息

Brinkmann Henner, van der Giezen Mark, Zhou Yan, Poncelin de Raucourt Gaëtan, Philippe Hervé

机构信息

Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de Biochimie, Université de Montréal, Succursale Centre-Ville, Montréal, Québec H3C3J7, Canada.

出版信息

Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.

DOI:10.1080/10635150500234609

PMID:16243762

Abstract

In the context of exponential growing molecular databases, it becomes increasingly easy to assemble large multigene data sets for phylogenomic studies. The expected increase of resolution due to the reduction of the sampling (stochastic) error is becoming a reality. However, the impact of systematic biases will also become more apparent or even dominant. We have chosen to study the case of the long-branch attraction artefact (LBA) using real instead of simulated sequences. Two fast-evolving eukaryotic lineages, whose evolutionary positions are well established, microsporidia and the nucleomorph of cryptophytes, were chosen as model species. A large data set was assembled (44 species, 133 genes, and 24,294 amino acid positions) and the resulting rooted eukaryotic phylogeny (using a distant archaeal outgroup) is positively misled by an LBA artefact despite the use of a maximum likelihood-based tree reconstruction method with a complex model of sequence evolution. When the fastest evolving proteins from the fast lineages are progressively removed (up to 90%), the bootstrap support for the apparently artefactual basal placement decreases to virtually 0%, and conversely only the expected placement, among all the possible locations of the fast-evolving species, receives increasing support that eventually converges to 100%. The percentage of removal of the fastest evolving proteins constitutes a reliable estimate of the sensitivity of phylogenetic inference to LBA. This protocol confirms that both a rich species sampling (especially the presence of a species that is closely related to the fast-evolving lineage) and a probabilistic method with a complex model are important to overcome the LBA artefact. Finally, we observed that phylogenetic inference methods perform strikingly better with simulated as opposed to real data, and suggest that testing the reliability of phylogenetic inference methods with simulated data leads to overconfidence in their performance. Although phylogenomic studies can be affected by systematic biases, the possibility of discarding a large amount of data containing most of the nonphylogenetic signal allows recovering a phylogeny that is less affected by systematic biases, while maintaining a high statistical support.

摘要

在分子数据库呈指数增长的背景下，为系统发育基因组学研究组装大型多基因数据集变得越来越容易。由于抽样（随机）误差的减少而带来的分辨率预期提升正在成为现实。然而，系统偏差的影响也将变得更加明显甚至占据主导地位。我们选择使用真实序列而非模拟序列来研究长枝吸引假象（LBA）的情况。选择了进化位置已明确的两个快速进化的真核生物谱系，即微孢子虫和隐藻的核质体作为模式物种。组装了一个大型数据集（44个物种、133个基因和24294个氨基酸位点），尽管使用了基于最大似然法的树重建方法以及复杂的序列进化模型，但所得的有根真核生物系统发育树仍被LBA假象误导。当逐步去除快速进化谱系中进化最快的蛋白质（去除比例高达90%）时，对明显为假象的基部位置的自展支持率降至几乎为0%，相反，在快速进化物种的所有可能位置中，只有预期的位置获得越来越多的支持，最终收敛到100%。去除进化最快蛋白质的比例构成了系统发育推断对LBA敏感性的可靠估计。该方案证实，丰富的物种抽样（特别是存在与快速进化谱系密切相关的物种）和具有复杂模型的概率方法对于克服LBA假象都很重要。最后，我们观察到系统发育推断方法在处理模拟数据时的表现明显优于真实数据，并表明用模拟数据测试系统发育推断方法会导致对其性能过度自信。尽管系统发育基因组学研究可能会受到系统偏差的影响，但丢弃包含大部分非系统发育信号的大量数据的可能性使得能够恢复受系统偏差影响较小的系统发育树，同时保持较高的统计支持率。

相似文献

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。

Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.

Lack of resolution in the animal phylogeny: closely spaced cladogeneses or undetected systematic errors?动物系统发育中分辨率的缺失：是紧密间隔的分支发生还是未被检测到的系统误差？

Mol Biol Evol. 2007 Jan;24(1):6-9. doi: 10.1093/molbev/msl137. Epub 2006 Sep 29.

Heterotachy and tree building: a case study with plastids and eubacteria.异速进化与系统发育树构建：以质体和真细菌为例的研究

Mol Biol Evol. 2006 Jan;23(1):40-5. doi: 10.1093/molbev/msj005. Epub 2005 Sep 8.

Exploring rate variation among and within sites in a densely sampled tree: species level phylogenetics of north american tiger beetles (genus cicindela).在一棵密集采样的树中探索不同地点之间以及地点内部的速率变化：北美虎甲（虎甲属）的物种水平系统发育学

Syst Biol. 2005 Feb;54(1):4-20. doi: 10.1080/10635150590906028.

Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design.最大似然估计物种树：系统发育推断的准确性如何取决于分歧历史和采样设计。

Syst Biol. 2009 Oct;58(5):501-8. doi: 10.1093/sysbio/syp045. Epub 2009 Aug 20.

Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of rhizaria with chromalveolates.系统发育基因组学分析支持隐藻和定鞭藻的单系性以及根足虫与色藻的关联。

Mol Biol Evol. 2007 Aug;24(8):1702-13. doi: 10.1093/molbev/msm089. Epub 2007 May 7.

Molecular phylogeny of acariform mites (Acari, Arachnida): strong conflict between phylogenetic signal and long-branch attraction artifacts.蜱螨目（蛛形纲：蜱螨亚纲）螨类的分子系统发生学：系统发育信号与长枝吸引伪像之间的强烈冲突。

Mol Phylogenet Evol. 2010 Jul;56(1):222-41. doi: 10.1016/j.ympev.2009.12.020. Epub 2010 Jan 6.

SuperTRI: A new approach based on branch support analyses of multiple independent data sets for assessing reliability of phylogenetic inferences.SuperTRI：一种基于对多个独立数据集进行分支支持分析来评估系统发育推断可靠性的新方法。

C R Biol. 2009 Sep;332(9):832-47. doi: 10.1016/j.crvi.2009.05.001. Epub 2009 Jun 18.

Untangling long branches: identifying conflicting phylogenetic signals using spectral analysis, neighbor-net, and consensus networks.理清长分支：使用频谱分析、邻接网络和共识网络识别冲突的系统发育信号。

Syst Biol. 2005 Aug;54(4):620-33. doi: 10.1080/106351591007462.

Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF-1alpha phylogenies.共变位点转移会导致一种长枝吸引假象，这种假象在EF-1α系统发育树中将微孢子虫和古细菌归为一类。

Mol Biol Evol. 2004 Jul;21(7):1340-9. doi: 10.1093/molbev/msh130. Epub 2004 Mar 19.

引用本文的文献

Ancient Host-Virus Gene Transfer Hints at a Diverse Pre-LECA Virosphere.古代宿主-病毒基因转移暗示了多样化的前LECA病毒圈。

J Mol Evol. 2025 Apr 29. doi: 10.1007/s00239-025-10246-8.

Unraveling myriapod evolution: sealion, a novel quartet-based approach for evaluating phylogenetic uncertainty.揭开多足动物的进化历程：海狮，一种基于四重奏的评估系统发育不确定性的新方法。

NAR Genom Bioinform. 2025 Mar 7;7(1):lqaf018. doi: 10.1093/nargab/lqaf018. eCollection 2025 Mar.

A Phylogenomic Backbone for Acoelomorpha Inferred From Transcriptomic Data.基于转录组数据推断的无肠动物系统基因组骨架

Syst Biol. 2025 Feb 10;74(1):70-85. doi: 10.1093/sysbio/syae057.

The genus contains only two species, both unable to produce microcystins: and .该属仅包含两个物种，均无法产生微囊藻毒素：和。

iScience. 2024 Aug 30;27(9):110845. doi: 10.1016/j.isci.2024.110845. eCollection 2024 Sep 20.

A taxon-rich and genome-scale phylogeny of Opisthokonta.后生动物的一个富含分类群和基因组规模的系统发育关系。

PLoS Biol. 2024 Sep 16;22(9):e3002794. doi: 10.1371/journal.pbio.3002794. eCollection 2024 Sep.

GTRpmix: A Linked General Time-Reversible Model for Profile Mixture Models.GTRpmix：一种用于轮廓混合模型的关联广义时间可逆模型。

Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae174.

The promise and pitfalls of synteny in phylogenomics.系统发生基因组学中同线性的前景与陷阱。

PLoS Biol. 2024 May 20;22(5):e3002632. doi: 10.1371/journal.pbio.3002632. eCollection 2024 May.

Is Over-parameterization a Problem for Profile Mixture Models?过参数化对轮廓混合模型是一个问题吗？

Syst Biol. 2024 May 27;73(1):53-75. doi: 10.1093/sysbio/syad063.

Virus Pop-Expanding Viral Databases by Protein Sequence Simulation.通过蛋白质序列模拟扩大病毒数据库。

Viruses. 2023 May 24;15(6):1227. doi: 10.3390/v15061227.

Mitochondrial genome comparison reveals the evolution of cnidarians.线粒体基因组比较揭示了刺胞动物的进化。

Ecol Evol. 2023 Jun 13;13(6):e10157. doi: 10.1002/ece3.10157. eCollection 2023 Jun.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验