Theis Corinna, Zirbel Craig L, Zu Siederdissen Christian Höner, Anthon Christian, Hofacker Ivo L, Nielsen Henrik, Gorodkin Jan
Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark; Department of Veterinary Clinical and Animal Science, Faculty of Health and Medical Science, University of Copenhagen, Frederiksberg, Denmark.
Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, Ohio, United States of America.
PLoS One. 2015 Oct 28;10(10):e0139900. doi: 10.1371/journal.pone.0139900. eCollection 2015.
Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution. These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D module prediction tools and apply them on a 13-way vertebrate sequence-based alignment. We find that RNA 3D modules predicted by metaRNAmodules and JAR3D are significantly enriched in the screened windows compared to their shuffled counterparts. The initially estimated FDR of 47.0% is lowered to below 25% when certain 3D module predictions are present in the window of the 2D prediction. We discuss the implications and prospects for further development of computational strategies for detection of RNA 2D structure in genomic sequence.
近期的实验和计算进展揭示了基因组中RNA结构的巨大潜力。这是由利用相关生物的多个基因组来识别共同序列和二级结构的计算策略推动的。然而,这些计算方法有两个主要挑战:计算成本高且错误发现率(FDR)相对较高。同时,RNA三维结构分析揭示了由非经典碱基对组成的模块,这些模块出现在非同源位置,显然是通过独立进化形成的。例如,这些模块可以出现在RNA二维预测中显示为内环的结构元件内部。因此,一个问题是,使用这种RNA三维信息是否能在全基因组水平上提高RNA二级结构的预测准确性。在这里,我们将RNAz与三维模块预测工具结合使用,并将它们应用于基于13种脊椎动物序列的比对。我们发现,与随机打乱的对应物相比,metaRNAmodules和JAR3D预测的RNA三维模块在筛选窗口中显著富集。当二维预测窗口中存在某些三维模块预测时,最初估计的47.0%的错误发现率会降低到25%以下。我们讨论了在基因组序列中检测RNA二维结构的计算策略的进一步发展的意义和前景。