Molecular Biodiversity Research Unit, Zoologisches Forschungsmuseum Alexander Koenig, Bonn, Germany.
Mol Biol Evol. 2010 Nov;27(11):2507-21. doi: 10.1093/molbev/msq140. Epub 2010 Jun 7.
The use of secondary structures has been advocated to improve both the alignment and the tree reconstruction processes of ribosomal RNA (rRNA) data sets. We used simulated and empirical rRNA data to test the impact of secondary structure consideration in both steps of molecular phylogenetic analyses. A simulation approach was used to generate realistic rRNA data sets based on real 16S, 18S, and 28S sequences and structures in combination with different branch length and topologies. Alignment and tree reconstruction performance of four recent structural alignment methods was compared with exclusively sequence-based approaches. As empirical data, we used a hexapod rRNA data set to study the influence of nucleotide interdependencies in sequence alignment and tree reconstruction. Structural alignment methods delivered significantly better sequence alignments compared with pure sequence-based methods. Also, structural alignment methods delivered better trees judged by topological congruence to simulation base trees. However, the advantage of structural alignments was less pronounced and even vanished in several instances. For simulated data, application of mixed RNA/DNA models to stems and loops, respectively, led to significantly shorter branches. The application of mixed RNA/DNA models in the hexapod analyses delivered partly implausible relationships. This can be interpreted as a stronger sensitivity of mixed model setups to nonphylogenetic signal. Secondary structure consideration clearly influenced sequence alignment and tree reconstruction of ribosomal genes. Although sequence alignment quality can considerably be improved by the use of secondary structure information, the application of mixed models in tree reconstructions needs further studies to understand the observed effects.
二级结构的使用被提倡用于改进核糖体 RNA(rRNA)数据集的比对和树重建过程。我们使用模拟和经验 rRNA 数据来测试在分子系统发育分析的这两个步骤中考虑二级结构的影响。我们使用模拟方法基于真实的 16S、18S 和 28S 序列和结构以及不同的分支长度和拓扑结构来生成逼真的 rRNA 数据集。我们比较了四种最近的结构比对方法的比对和树重建性能与仅基于序列的方法。作为经验数据,我们使用六足动物 rRNA 数据集来研究核苷酸相互依存关系对序列比对和树重建的影响。与纯基于序列的方法相比,结构比对方法提供了明显更好的序列比对。此外,结构比对方法在树重建方面具有更好的拓扑一致性,从而与模拟基础树更吻合。然而,在某些情况下,结构比对的优势不太明显甚至消失。对于模拟数据,分别将 RNA/DNA 混合模型应用于茎和环,导致分支显著变短。在六足动物分析中应用 RNA/DNA 混合模型产生了部分不合理的关系。这可以解释为混合模型设置对非系统发育信号的敏感性更强。二级结构的考虑明显影响了核糖体基因的序列比对和树重建。虽然通过使用二级结构信息可以显著提高序列比对的质量,但在树重建中应用混合模型需要进一步研究,以了解观察到的效果。