Zoologisches Forschungsmuseum Alexander Koenig, Zentrum für molekulare Biodiversitätsforschung, Adenauerallee, Bonn, Germany.
BMC Evol Biol. 2011 May 27;11:146. doi: 10.1186/1471-2148-11-146.
Failure to account for covariation patterns in helical regions of ribosomal RNA (rRNA) genes has the potential to misdirect the estimation of the phylogenetic signal of the data. Furthermore, the extremes of length variation among taxa, combined with regional substitution rate variation can mislead the alignment of rRNA sequences and thus distort subsequent tree reconstructions. However, recent developments in phylogenetic methodology now allow a comprehensive integration of secondary structures in alignment and tree reconstruction analyses based on rRNA sequences, which has been shown to correct some of these problems. Here, we explore the potentials of RNA substitution models and the interactions of specific model setups with the inherent pattern of covariation in rRNA stems and substitution rate variation among loop regions.
We found an explicit impact of RNA substitution models on tree reconstruction analyses. The application of specific RNA models in tree reconstructions is hampered by interaction between the appropriate modelling of covarying sites in stem regions, and excessive homoplasy in some loop regions. RNA models often failed to recover reasonable trees when single-stranded regions are excessively homoplastic, because these regions contribute a greater proportion of the data when covarying sites are essentially downweighted. In this context, the RNA6A model outperformed all other models, including the more parametrized RNA7 and RNA16 models.
Our results depict a trade-off between increased accuracy in estimation of interdependencies in helical regions with the risk of magnifying positions lacking phylogenetic signal. We can therefore conclude that caution is warranted when applying rRNA covariation models, and suggest that loop regions be independently screened for phylogenetic signal, and eliminated when they are indistinguishable from random noise. In addition to covariation and homoplasy, other factors, like non-stationarity of substitution rates and base compositional heterogeneity, can disrupt the signal of ribosomal RNA data. All these factors dictate sophisticated estimation of evolutionary pattern in rRNA data, just as other molecular data require similarly complicated (but different) corrections.
未能解释核糖体 RNA(rRNA) 基因螺旋区的协变模式有可能误导数据系统发育信号的估计。此外,-taxa 之间长度变化的极端情况,加上区域替代率的变化,可能会导致 rRNA 序列的比对出现偏差,从而扭曲后续的树重建。然而,系统发育方法学的最新发展现在允许在 rRNA 序列的比对和树重建分析中全面整合二级结构,这已经被证明可以纠正其中的一些问题。在这里,我们探讨了 RNA 替代模型的潜力,以及特定模型设置与 rRNA 茎部的固有协变模式和环区替代率变化之间的相互作用。
我们发现 RNA 替代模型对树重建分析有明显的影响。在树重建分析中应用特定的 RNA 模型受到茎区共变位点适当建模与一些环区过度同型性之间的相互作用的阻碍。当单链区过度同型性时,RNA 模型往往无法恢复合理的树,因为这些区域在共变位点基本上被低估时,对数据的贡献比例更大。在这种情况下,RNA6A 模型优于所有其他模型,包括参数更多的 RNA7 和 RNA16 模型。
我们的结果描绘了在螺旋区相互依存关系的估计精度增加与缺乏系统发育信号的位置放大风险之间的权衡。因此,当应用 rRNA 协变模型时应谨慎行事,并建议独立筛选环区是否具有系统发育信号,并在与随机噪声无法区分时将其删除。除了协变和同型性之外,其他因素,如替代率的非平稳性和碱基组成的异质性,也会破坏核糖体 RNA 数据的信号。所有这些因素都决定了 rRNA 数据进化模式的复杂估计,就像其他分子数据需要类似复杂(但不同)的校正一样。