Wu Yang, Shi Binbin, Ding Xinqiang, Liu Tong, Hu Xihao, Yip Kevin Y, Yang Zheng Rong, Mathews David H, Lu Zhi John
MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Center for Plant Biology and Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China.
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China.
Nucleic Acids Res. 2015 Sep 3;43(15):7247-59. doi: 10.1093/nar/gkv706. Epub 2015 Jul 13.
Recently, several experimental techniques have emerged for probing RNA structures based on high-throughput sequencing. However, most secondary structure prediction tools that incorporate probing data are designed and optimized for particular types of experiments. For example, RNAstructure-Fold is optimized for SHAPE data, while SeqFold is optimized for PARS data. Here, we report a new RNA secondary structure prediction method, restrained MaxExpect (RME), which can incorporate multiple types of experimental probing data and is based on a free energy model and an MEA (maximizing expected accuracy) algorithm. We first demonstrated that RME substantially improved secondary structure prediction with perfect restraints (base pair information of known structures). Next, we collected structure-probing data from diverse experiments (e.g. SHAPE, PARS and DMS-seq) and transformed them into a unified set of pairing probabilities with a posterior probabilistic model. By using the probability scores as restraints in RME, we compared its secondary structure prediction performance with two other well-known tools, RNAstructure-Fold (based on a free energy minimization algorithm) and SeqFold (based on a sampling algorithm). For SHAPE data, RME and RNAstructure-Fold performed better than SeqFold, because they markedly altered the energy model with the experimental restraints. For high-throughput data (e.g. PARS and DMS-seq) with lower probing efficiency, the secondary structure prediction performances of the tested tools were comparable, with performance improvements for only a portion of the tested RNAs. However, when the effects of tertiary structure and protein interactions were removed, RME showed the highest prediction accuracy in the DMS-accessible regions by incorporating in vivo DMS-seq data.
最近,出现了几种基于高通量测序探测RNA结构的实验技术。然而,大多数整合探测数据的二级结构预测工具都是针对特定类型的实验设计和优化的。例如,RNAstructure-Fold是针对SHAPE数据进行优化的,而SeqFold是针对PARS数据进行优化的。在此,我们报告一种新的RNA二级结构预测方法,受限最大期望(RME),它可以整合多种类型的实验探测数据,并且基于自由能模型和MEA(最大化期望准确率)算法。我们首先证明,RME在具有完美约束(已知结构的碱基对信息)的情况下显著改善了二级结构预测。接下来,我们从各种实验(如SHAPE、PARS和DMS-seq)中收集结构探测数据,并通过后验概率模型将它们转换为一组统一的配对概率。通过在RME中使用概率分数作为约束,我们将其二级结构预测性能与其他两个知名工具RNAstructure-Fold(基于自由能最小化算法)和SeqFold(基于采样算法)进行了比较。对于SHAPE数据,RME和RNAstructure-Fold的表现优于SeqFold,因为它们通过实验约束显著改变了能量模型。对于探测效率较低的高通量数据(如PARS和DMS-seq),测试工具的二级结构预测性能相当,仅部分测试RNA的性能有所提高。然而,当去除三级结构和蛋白质相互作用的影响时,通过整合体内DMS-seq数据,RME在DMS可及区域显示出最高的预测准确率。