Corley Meredith, Solem Amanda, Qu Kun, Chang Howard Y, Laederach Alain
Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 37599, USA Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 37599, USA.
Nucleic Acids Res. 2015 Feb 18;43(3):1859-68. doi: 10.1093/nar/gkv010. Epub 2015 Jan 23.
Ribonucleic acid (RNA) secondary structure prediction continues to be a significant challenge, in particular when attempting to model sequences with less rigidly defined structures, such as messenger and non-coding RNAs. Crucial to interpreting RNA structures as they pertain to individual phenotypes is the ability to detect RNAs with large structural disparities caused by a single nucleotide variant (SNV) or riboSNitches. A recently published human genome-wide parallel analysis of RNA structure (PARS) study identified a large number of riboSNitches as well as non-riboSNitches, providing an unprecedented set of RNA sequences against which to benchmark structure prediction algorithms. Here we evaluate 11 different RNA folding algorithms' riboSNitch prediction performance on these data. We find that recent algorithms designed specifically to predict the effects of SNVs on RNA structure, in particular remuRNA, RNAsnp and SNPfold, perform best on the most rigorously validated subsets of the benchmark data. In addition, our benchmark indicates that general structure prediction algorithms (e.g. RNAfold and RNAstructure) have overall better performance if base pairing probabilities are considered rather than minimum free energy calculations. Although overall aggregate algorithmic performance on the full set of riboSNitches is relatively low, significant improvement is possible if the highest confidence predictions are evaluated independently.
核糖核酸(RNA)二级结构预测仍然是一项重大挑战,尤其是在尝试对结构定义不太严格的序列(如信使RNA和非编码RNA)进行建模时。将RNA结构与个体表型联系起来进行解释的关键在于,能够检测出由单核苷酸变异(SNV)或核糖核酸单核苷酸变异(riboSNitch)导致结构差异较大的RNA。最近发表的一项全人类基因组范围的RNA结构平行分析(PARS)研究,识别出了大量的核糖核酸单核苷酸变异以及非核糖核酸单核苷酸变异,提供了一组前所未有的RNA序列,可用于对结构预测算法进行基准测试。在此,我们评估了11种不同RNA折叠算法在这些数据上的核糖核酸单核苷酸变异预测性能。我们发现,专门设计用于预测SNV对RNA结构影响的最新算法,特别是remuRNA、RNAsnp和SNPfold,在基准数据中经过最严格验证的子集中表现最佳。此外,我们的基准测试表明,如果考虑碱基配对概率而非最小自由能计算,通用结构预测算法(如RNAfold和RNAstructure)的整体性能会更好。尽管在全套核糖核酸单核苷酸变异上算法的总体聚合性能相对较低,但如果对最高置信度的预测进行独立评估,仍有可能实现显著改进。