Hofacker Ivo L, Bernhart Stephan H F, Stadler Peter F
Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstrasse 17, Vienna, Austria.
Bioinformatics. 2004 Sep 22;20(14):2222-7. doi: 10.1093/bioinformatics/bth229. Epub 2004 Apr 8.
Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are an indispensable necessity in RNA bioinformatics.
We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximum-weight common secondary structure and an associated alignment.
The programs pmcomp and pmmulti described in this contribution are implemented in Perl and can be downloaded together with the example datasets from http://www.tbi.univie.ac.at/RNA/PMcomp/. A web server is available at http://rna.tbi.univie.ac.at/cgi-bin/pmcgi.pl
许多类功能性RNA分子的特征是具有高度保守的二级结构,但可检测到的序列相似性很少。因此,只有在考虑共享结构特征时才能构建可靠的多序列比对。由于多序列比对被用作许多后续数据分析方法的输入,基于结构的比对在RNA生物信息学中是必不可少的。
我们在此提出一种通过直接比较碱基配对概率矩阵来计算成对和渐进多序列比对的方法。与经典的桑科夫算法不同,我们不是试图同时解决折叠和比对问题,而是使用麦卡斯基尔方法来计算碱基配对概率矩阵,该矩阵有效地整合了每个序列的能量信息。然后可以采用一种新颖、简化的桑科夫算法变体来提取最大权重的共同二级结构和相关的比对。
本论文中描述的程序pmcomp和pmmulti用Perl语言实现,可以从http://www.tbi.univie.ac.at/RNA/PMcomp/连同示例数据集一起下载。一个网络服务器可在http://rna.tbi.univie.ac.at/cgi-bin/pmcgi.pl获得。