Rangel L Thibério, Fournier Gregory P
Department of Earth, Atmospheric, & Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Microorganisms. 2023 Oct 5;11(10):2499. doi: 10.3390/microorganisms11102499.
The trimming of fast-evolving sites, often known as "slow-fast" analysis, is broadly used in microbial phylogenetic reconstruction under the assumption that fast-evolving sites do not retain an accurate phylogenetic signal due to substitution saturation. Therefore, removing sites that have experienced multiple substitutions would improve the signal-to-noise ratio in phylogenetic analyses, with the remaining slower-evolving sites preserving a more reliable record of evolutionary relationships. Here, we show that, contrary to this assumption, even the fastest-evolving sites present in the conserved proteins often used in Tree of Life studies contain reliable and valuable phylogenetic information, and that the trimming of such sites can negatively impact the accuracy of phylogenetic reconstruction. Simulated alignments modeled after ribosomal protein datasets used in Tree of Life studies consistently show that slow-evolving sites are less likely to recover true bipartitions than even the fastest-evolving sites. Furthermore, site-specific substitution rates are positively correlated with the frequency of accurately recovered short-branched bipartitions, as slowly evolving sites are less likely to have experienced substitutions along these intervals. Using published Tree of Life sequence alignment datasets, we also show that both slow- and fast-evolving sites contain similarly inconsistent phylogenetic signals, and that, for fast-evolving sites, this inconsistency can be attributed to poor alignment quality. Furthermore, trimming fast sites, slow sites, or both is shown to have a substantial impact on phylogenetic reconstruction across multiple evolutionary models. This is perhaps most evident in the resulting placements of the Eukarya and Asgardarchaeota groups, which are especially sensitive to the implementation of different trimming schemes.
对快速进化位点的修剪,通常称为“慢-快”分析,在微生物系统发育重建中被广泛使用,其假设是由于替换饱和,快速进化位点不会保留准确的系统发育信号。因此,去除经历了多次替换的位点将提高系统发育分析中的信噪比,其余进化较慢的位点保留更可靠的进化关系记录。在这里,我们表明,与这一假设相反,即使是生命之树研究中常用的保守蛋白质中存在的最快进化位点也包含可靠且有价值的系统发育信息,并且修剪这些位点会对系统发育重建的准确性产生负面影响。以生命之树研究中使用的核糖体蛋白质数据集为模型的模拟比对一致表明,进化较慢的位点比即使是最快进化的位点更不可能恢复真实的二分法。此外,位点特异性替换率与准确恢复的短分支二分法的频率呈正相关,因为进化较慢的位点在这些区间经历替换的可能性较小。使用已发表的生命之树序列比对数据集,我们还表明,进化慢和快的位点都包含类似不一致的系统发育信号,并且对于快速进化的位点,这种不一致可归因于比对质量差。此外,修剪快速进化位点、慢速进化位点或两者都修剪对多种进化模型的系统发育重建有重大影响。这在真核生物和阿斯加德古菌组的最终位置上可能最为明显,它们对不同修剪方案的实施特别敏感。