Bindewald Eckart, Shapiro Bruce A
Basic Research Program, SAIC-Frederick, Inc, National Cancer Institute-Frederick, MD 21702, USA.
RNA. 2006 Mar;12(3):342-52. doi: 10.1261/rna.2164906.
We present a machine learning method (a hierarchical network of k-nearest neighbor classifiers) that uses an RNA sequence alignment in order to predict a consensus RNA secondary structure. The input to the network is the mutual information, the fraction of complementary nucleotides, and a novel consensus RNAfold secondary structure prediction of a pair of alignment columns and its nearest neighbors. Given this input, the network computes a prediction as to whether a particular pair of alignment columns corresponds to a base pair. By using a comprehensive test set of 49 RFAM alignments, the program KNetFold achieves an average Matthews correlation coefficient of 0.81. This is a significant improvement compared with the secondary structure prediction methods PFOLD and RNAalifold. By using the example of archaeal RNase P, we show that the program can also predict pseudoknot interactions.
我们提出了一种机器学习方法(k近邻分类器的层次网络),该方法使用RNA序列比对来预测共有RNA二级结构。网络的输入是互信息、互补核苷酸的比例,以及一对比对列及其最近邻的新型共有RNAfold二级结构预测。基于这些输入,网络计算关于特定比对列对是否对应碱基对的预测。通过使用包含49个RFAM比对的综合测试集,程序KNetFold实现了平均马修斯相关系数为0.81。与二级结构预测方法PFOLD和RNAalifold相比,这是一个显著的改进。通过古细菌RNase P的例子,我们表明该程序还可以预测假结相互作用。