Suppr超能文献

检测并克服基因组尺度系统发育中的系统误差。

Detecting and overcoming systematic errors in genome-scale phylogenies.

作者信息

Rodríguez-Ezpeleta Naiara, Brinkmann Henner, Roure Béatrice, Lartillot Nicolas, Lang B Franz, Philippe Hervé

机构信息

Canadian Institute for Advanced Research, Centre Robert Cedergren, Département de Biochimie, Université de Montréal, 2900 Boulevard Edouard-Montpetit, Montréal, Québec, H3T 1J4, Canada.

出版信息

Syst Biol. 2007 Jun;56(3):389-99. doi: 10.1080/10635150701397643.

Abstract

Genome-scale data sets result in an enhanced resolution of the phylogenetic inference by reducing stochastic errors. However, there is also an increase of systematic errors due to model violations, which can lead to erroneous phylogenies. Here, we explore the impact of systematic errors on the resolution of the eukaryotic phylogeny using a data set of 143 nuclear-encoded proteins from 37 species. The initial observation was that, despite the impressive amount of data, some branches had no significant statistical support. To demonstrate that this lack of resolution is due to a mutual annihilation of phylogenetic and nonphylogenetic signals, we created a series of data sets with slightly different taxon sampling. As expected, these data sets yielded strongly supported but mutually exclusive trees, thus confirming the presence of conflicting phylogenetic and nonphylogenetic signals in the original data set. To decide on the correct tree, we applied several methods expected to reduce the impact of some kinds of systematic error. Briefly, we show that (i) removing fast-evolving positions, (ii) recoding amino acids into functional categories, and (iii) using a site-heterogeneous mixture model (CAT) are three effective means of increasing the ratio of phylogenetic to nonphylogenetic signal. Finally, our results allow us to formulate guidelines for detecting and overcoming phylogenetic artefacts in genome-scale phylogenetic analyses.

摘要

基因组规模的数据集通过减少随机误差提高了系统发育推断的分辨率。然而,由于模型违反也会导致系统误差增加,这可能导致错误的系统发育树。在这里,我们使用来自37个物种的143个核编码蛋白的数据集,探讨系统误差对真核生物系统发育分辨率的影响。最初的观察结果是,尽管数据量令人印象深刻,但一些分支没有显著的统计支持。为了证明这种分辨率的缺乏是由于系统发育信号和非系统发育信号的相互抵消,我们创建了一系列分类群抽样略有不同的数据集。正如预期的那样,这些数据集产生了得到强烈支持但相互排斥的树,从而证实了原始数据集中存在相互冲突的系统发育信号和非系统发育信号。为了确定正确的树,我们应用了几种预期能减少某些类型系统误差影响的方法。简而言之,我们表明:(i)去除快速进化的位点;(ii)将氨基酸重新编码为功能类别;(iii)使用位点异质性混合模型(CAT)是提高系统发育信号与非系统发育信号比例的三种有效方法。最后,我们的结果使我们能够制定在基因组规模系统发育分析中检测和克服系统发育假象的指导方针。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验