Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S38. doi: 10.1186/1471-2105-12-S1-S38.
Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion.
Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities.
Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/.
由于最近的研究揭示了非编码 RNA(ncRNA)在生物体内的重要性,因此对 ncRNA 进行准确高效的结构比对引起了越来越多的关注。虽然 Sankoff 风格的结构比对算法不能有效地处理多个序列,但通常使用渐进式方案来降低复杂度。然而,这种想法往往会在整个过程中传播早期错误,从而降低最终比对的质量。对于多个蛋白质序列比对,我们最近提出了 PicXAA,它以非渐进的方式构建准确的比对。
在这里,我们提出了 PicXAA-R,这是对 PicXAA 的扩展,用于贪婪的 ncRNA 结构比对。PicXAA-R 有效地捕捉了每个序列内的折叠信息和序列之间的局部相似性。它使用一组概率一致性变换,利用比对中所有序列的信息来提高后碱基配对和碱基比对概率。使用基于图的方案,我们从具有高碱基配对和碱基比对概率的序列区域中贪婪地构建结构比对。
针对不同特征的数据集进行的多项实验证实,PicXAA-R 是最快的多个 RNA 结构比对算法之一,它始终能产生准确的比对结果,尤其是对于具有局部相似序列的数据集。PicXAA-R 的源代码可在以下网址免费获取:http://www.ece.tamu.edu/~bjyoon/picxaa/。