使用位点异质性模型抑制动物系统发育中的长枝吸引假象。

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model.

作者信息

Lartillot Nicolas, Brinkmann Henner, Philippe Hervé

机构信息

Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, Montpellier Cedex 5, France.

出版信息

BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2148-7-S1-S4.

DOI:10.1186/1471-2148-7-S1-S4

PMID:17288577

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1796613/

Abstract

BACKGROUND

Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions.

METHODS

We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation.

RESULTS

Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences.

CONCLUSION

The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees.

摘要

背景

由于全基因组序列比对中包含大量信号，系统发育基因组学分析正趋向于得到高度支持的树。然而，高统计支持并不意味着树是准确的。系统误差，如长枝吸引（LBA）假象，可能会产生误导，特别是当分类群抽样不足或外类群距离较远时。在其他方面一致的概率框架中，全基因组分析中的系统误差可追溯到模型错误设定问题，这表明应设计出更好的序列进化模型，即使在最具挑战性的条件下，该模型对树重建假象也应更具稳健性。

方法

我们聚焦于后生动物树的先前系统发育基因组学研究中分析过的一个特征明确的LBA假象，在该研究中，两个快速进化的动物门，线虫和扁形动物，根据外类群的不同，要么出现在所有其他两侧对称动物的基部，要么出现在原口动物内部。我们将这个假象结果用作案例研究，以比较两种替代模型的稳健性：一种基于氨基酸替换经验矩阵（WAG）的标准位点均匀模型，以及一种位点异质混合模型（CAT）。同时，我们提出一种后验预测检验，用于衡量模型对序列饱和度的认知程度。

结果

采用贝叶斯框架，我们表明当使用位点异质模型CAT时，在WAG下观察到的LBA假象消失。通过交叉验证，我们进一步证明在该数据集上CAT比WAG具有更好的统计拟合度。最后，使用我们的统计拟合优度检验，我们表明CAT（而非WAG）正确地考虑了总体饱和度水平，这是由于对位点特异性氨基酸偏好的更好估计。

结论

CAT模型似乎比WAG模型对LBA假象更具稳健性，主要是因为它正确地预测了比对中每个位点氨基酸字母表有效大小较小所隐含的趋同和回复的高概率。更一般地说，我们的结果提供了有力证据，即替换过程中的位点特异性需要被考虑在内，以便获得更可靠的系统发育树。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9aca/1796613/a8642f992e8b/1471-2148-7-S1-S4-1.jpg

相似文献

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model.使用位点异质性模型抑制动物系统发育中的长枝吸引假象。

BMC Evol Biol. 2007 Feb 8;7 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2148-7-S1-S4.

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。

Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.

A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny.一种根据特定位点氨基酸频率进行调整并改进蛋白质系统发育推断的类频率混合模型。

BMC Evol Biol. 2008 Dec 16;8:331. doi: 10.1186/1471-2148-8-331.

Acoel flatworms are not platyhelminthes: evidence from phylogenomics.腔肠动物扁虫不是扁形动物：来自系统基因组学的证据。

PLoS One. 2007 Aug 8;2(8):e717. doi: 10.1371/journal.pone.0000717.

The Relative Importance of Modeling Site Pattern Heterogeneity Versus Partition-Wise Heterotachy in Phylogenomic Inference.系统发育基因组推断中模型化地点模式异质性与分区异速进化的相对重要性。

Syst Biol. 2019 Nov 1;68(6):1003-1019. doi: 10.1093/sysbio/syz021.

Modeling Site Heterogeneity with Posterior Mean Site Frequency Profiles Accelerates Accurate Phylogenomic Estimation.利用后验均值位点频率分布模型化位点异质性可加速准确的系统基因组估计。

Syst Biol. 2018 Mar 1;67(2):216-235. doi: 10.1093/sysbio/syx068.

Who Let the CAT Out of the Bag? Accurately Dealing with Substitutional Heterogeneity in Phylogenomic Analyses.谁把猫从袋子里放出来了？在系统发育基因组分析中准确处理替代异质性。

Syst Biol. 2017 Mar 1;66(2):232-255. doi: 10.1093/sysbio/syw084.

Is Over-parameterization a Problem for Profile Mixture Models?过参数化对轮廓混合模型是一个问题吗？

Syst Biol. 2024 May 27;73(1):53-75. doi: 10.1093/sysbio/syad063.

What is the phylogenetic signal limit from mitogenomes? The reconciliation between mitochondrial and nuclear data in the Insecta class phylogeny.线粒体基因组的系统发育信号极限是多少？昆虫纲系统发育中核与线粒体数据的整合。

BMC Evol Biol. 2011 Oct 27;11:315. doi: 10.1186/1471-2148-11-315.

Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships.模型选择、缺失数据和分类群采样对深担子菌系统发育关系的基因组推断的影响。

Syst Biol. 2020 Jan 1;69(1):17-37. doi: 10.1093/sysbio/syz029.

引用本文的文献

Stochastic Character Mapping: An Under-Exploited Approach to the Study of Molecular Evolution.随机特征映射：一种尚未充分利用的分子进化研究方法。

J Mol Evol. 2025 Aug;93(4):465-473. doi: 10.1007/s00239-025-10257-5. Epub 2025 Jul 8.

n. sp.: a novel predatory flagellate illuminates the character evolution within the eukaryotic clade CRuMs.新物种：一种新型捕食性鞭毛虫揭示了真核生物进化枝CRuMs内的性状进化。

Open Biol. 2025 Jun;15(6):250057. doi: 10.1098/rsob.250057. Epub 2025 Jun 4.

An obligate symbiont of with a strongly reduced genome resembles symbiotic bacteria in sucking lice.一种基因组大幅缩减的专性共生体类似于吸虱中的共生细菌。

Appl Environ Microbiol. 2025 Jun 18;91(6):e0022025. doi: 10.1128/aem.00220-25. Epub 2025 May 14.

Robustness of Ancestral Sequence Reconstruction to Among-site and Among-lineage Evolutionary Heterogeneity.祖先序列重建对位点间和谱系间进化异质性的稳健性。

Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf084.

Evolutionary History of Bilaterian FoxP Genes: Complex Ancestral Functions and Evolutionary Changes Spanning 2R-WGD in the Vertebrate Lineage.两侧对称动物FoxP基因的进化史：复杂的祖先功能以及脊椎动物谱系中跨越2R全基因组复制的进化变化

Mol Biol Evol. 2025 Apr 1;42(4). doi: 10.1093/molbev/msaf072.

The evolution of the plastid genomes in the holoparasitic Balanophoraceae.全寄生性蛇菰科植物质体基因组的进化

Proc Biol Sci. 2025 Mar;292(2043):20242011. doi: 10.1098/rspb.2024.2011. Epub 2025 Mar 26.

Myxozoan parasite genomes assembled from contaminated host data reveal extensive gene order conservation and rapid sequence evolution.从受污染的宿主数据中组装的粘孢子虫寄生虫基因组揭示了广泛的基因顺序保守性和快速的序列进化。

G3 (Bethesda). 2025 Jul 9;15(7). doi: 10.1093/g3journal/jkaf061.

Unraveling myriapod evolution: sealion, a novel quartet-based approach for evaluating phylogenetic uncertainty.揭开多足动物的进化历程：海狮，一种基于四重奏的评估系统发育不确定性的新方法。

NAR Genom Bioinform. 2025 Mar 7;7(1):lqaf018. doi: 10.1093/nargab/lqaf018. eCollection 2025 Mar.

Phylogenetic relationships and divergence times of Odonata inferred from mitochondrial genome.基于线粒体基因组推断蜻蜓目昆虫的系统发育关系和分化时间。

iScience. 2025 Jan 11;28(2):111806. doi: 10.1016/j.isci.2025.111806. eCollection 2025 Feb 21.

CAT-Posterior Mean Site Frequencies Improves Phylogenetic Modeling Under Maximum Likelihood and Resolves Tardigrada as the Sister of Arthropoda Plus Onychophora.CAT-后验均值位点频率在最大似然法下改进了系统发育建模，并将缓步动物门解析为节肢动物门和有爪动物门的姊妹类群。

Genome Biol Evol. 2025 Jan 6;17(1). doi: 10.1093/gbe/evae273.

本文引用的文献

Conjugate Gibbs sampling for Bayesian phylogenetic models.贝叶斯系统发育模型的共轭吉布斯抽样

J Comput Biol. 2006 Dec;13(10):1701-22. doi: 10.1089/cmb.2006.13.1701.

Computing Bayes factors using thermodynamic integration.使用热力学积分计算贝叶斯因子。

Syst Biol. 2006 Apr;55(2):195-207. doi: 10.1080/10635150500433722.

Phylogenomics: the beginning of incongruence?系统发育基因组学：不一致的开端？

Trends Genet. 2006 Apr;22(4):225-31. doi: 10.1016/j.tig.2006.02.003. Epub 2006 Feb 21.

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。

Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.

Phylogenomics and the reconstruction of the tree of life.系统发育基因组学与生命之树的重建

Nat Rev Genet. 2005 May;6(5):361-75. doi: 10.1038/nrg1603.

Site interdependence attributed to tertiary structure in amino acid sequence evolution.氨基酸序列进化中归因于三级结构的位点相互依赖性。

Gene. 2005 Mar 14;347(2):207-17. doi: 10.1016/j.gene.2004.12.011. Epub 2005 Feb 19.

Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia.两侧对称动物的多基因分析证实了蜕皮动物总门、冠轮动物总门和原口动物的单系性。

Mol Biol Evol. 2005 May;22(5):1246-53. doi: 10.1093/molbev/msi111. Epub 2005 Feb 9.

An alternative model of amino acid replacement.氨基酸替代的另一种模型。

Bioinformatics. 2005 Apr 1;21(7):975-80. doi: 10.1093/bioinformatics/bti109. Epub 2004 Nov 5.

Modeling compositional heterogeneity.对成分异质性进行建模。

Syst Biol. 2004 Jun;53(3):485-95. doi: 10.1080/10635150490445779.

Genome-scale data, angiosperm relationships, and "ending incongruence": a cautionary tale in phylogenetics.基因组规模数据、被子植物的关系以及“消除不一致性”：系统发育学中的一个警示故事。

Trends Plant Sci. 2004 Oct;9(10):477-83. doi: 10.1016/j.tplants.2004.08.008.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用位点异质性模型抑制动物系统发育中的长枝吸引假象。

Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献