Suppr超能文献

对深层真核生物系统发育基因组学中长枝吸引假象的实证评估。

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics.

作者信息

Brinkmann Henner, van der Giezen Mark, Zhou Yan, Poncelin de Raucourt Gaëtan, Philippe Hervé

机构信息

Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de Biochimie, Université de Montréal, Succursale Centre-Ville, Montréal, Québec H3C3J7, Canada.

出版信息

Syst Biol. 2005 Oct;54(5):743-57. doi: 10.1080/10635150500234609.

Abstract

In the context of exponential growing molecular databases, it becomes increasingly easy to assemble large multigene data sets for phylogenomic studies. The expected increase of resolution due to the reduction of the sampling (stochastic) error is becoming a reality. However, the impact of systematic biases will also become more apparent or even dominant. We have chosen to study the case of the long-branch attraction artefact (LBA) using real instead of simulated sequences. Two fast-evolving eukaryotic lineages, whose evolutionary positions are well established, microsporidia and the nucleomorph of cryptophytes, were chosen as model species. A large data set was assembled (44 species, 133 genes, and 24,294 amino acid positions) and the resulting rooted eukaryotic phylogeny (using a distant archaeal outgroup) is positively misled by an LBA artefact despite the use of a maximum likelihood-based tree reconstruction method with a complex model of sequence evolution. When the fastest evolving proteins from the fast lineages are progressively removed (up to 90%), the bootstrap support for the apparently artefactual basal placement decreases to virtually 0%, and conversely only the expected placement, among all the possible locations of the fast-evolving species, receives increasing support that eventually converges to 100%. The percentage of removal of the fastest evolving proteins constitutes a reliable estimate of the sensitivity of phylogenetic inference to LBA. This protocol confirms that both a rich species sampling (especially the presence of a species that is closely related to the fast-evolving lineage) and a probabilistic method with a complex model are important to overcome the LBA artefact. Finally, we observed that phylogenetic inference methods perform strikingly better with simulated as opposed to real data, and suggest that testing the reliability of phylogenetic inference methods with simulated data leads to overconfidence in their performance. Although phylogenomic studies can be affected by systematic biases, the possibility of discarding a large amount of data containing most of the nonphylogenetic signal allows recovering a phylogeny that is less affected by systematic biases, while maintaining a high statistical support.

摘要

在分子数据库呈指数增长的背景下,为系统发育基因组学研究组装大型多基因数据集变得越来越容易。由于抽样(随机)误差的减少而带来的分辨率预期提升正在成为现实。然而,系统偏差的影响也将变得更加明显甚至占据主导地位。我们选择使用真实序列而非模拟序列来研究长枝吸引假象(LBA)的情况。选择了进化位置已明确的两个快速进化的真核生物谱系,即微孢子虫和隐藻的核质体作为模式物种。组装了一个大型数据集(44个物种、133个基因和24294个氨基酸位点),尽管使用了基于最大似然法的树重建方法以及复杂的序列进化模型,但所得的有根真核生物系统发育树仍被LBA假象误导。当逐步去除快速进化谱系中进化最快的蛋白质(去除比例高达90%)时,对明显为假象的基部位置的自展支持率降至几乎为0%,相反,在快速进化物种的所有可能位置中,只有预期的位置获得越来越多的支持,最终收敛到100%。去除进化最快蛋白质的比例构成了系统发育推断对LBA敏感性的可靠估计。该方案证实,丰富的物种抽样(特别是存在与快速进化谱系密切相关的物种)和具有复杂模型的概率方法对于克服LBA假象都很重要。最后,我们观察到系统发育推断方法在处理模拟数据时的表现明显优于真实数据,并表明用模拟数据测试系统发育推断方法会导致对其性能过度自信。尽管系统发育基因组学研究可能会受到系统偏差的影响,但丢弃包含大部分非系统发育信号的大量数据的可能性使得能够恢复受系统偏差影响较小的系统发育树,同时保持较高的统计支持率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验