Harmanci Arif Ozgun, Sharma Gaurav, Mathews David H
Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, RC Box 270126, Rochester, NY 14627, USA.
Nucleic Acids Res. 2008 Apr;36(7):2406-17. doi: 10.1093/nar/gkn043. Epub 2008 Feb 26.
A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu.
提出了一种联合预测两个RNA序列比对和共有二级结构的新方法。通过在由新引入的称为匹配螺旋区域的基序所定义的搜索空间上进行结构比对,实现了对共有二级结构和比对的联合考虑。匹配螺旋区域公式概括了先前用于结构比对的约束条件,从而更好地适应了RNA家族内的结构变异性。基于从预先计算的碱基配对和比对概率获得的伪自由能的概率模型用于对结构比对进行评分。通过一种名为PARTS的动态规划算法,从该模型中获得最大后验(MAP)共有二级结构、序列比对和碱基配对的联合后验概率。PARTS更通用的结构比对的优势在核糖核酸酶P家族的二级结构预测中得以体现。对于该家族,PARTS对二级结构和比对的MAP预测明显优于使用更具限制性结构比对模型的先前方法。对于tRNA和5S rRNA家族,PARTS更丰富的结构比对模型并无优势,因此该方法与现有替代方法的性能相当。对于所有研究的RNA家族,从PARTS获得的后验概率估计比单序列预测的后验概率估计有所改进。当考虑预测的置信度阈值以上的碱基配对时,PARTS的灵敏度和阳性预测值的组合优于单序列预测。PARTS源代码可在GNU公共许可证下从http://rna.urmc.rochester.edu下载。