利用进化谱、突变耦合和二维迁移学习改进RNA二级结构和三级碱基配对预测。

Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning.

作者信息

Singh Jaswinder, Paliwal Kuldip, Zhang Tongchuan, Singh Jaspreet, Litfin Thomas, Zhou Yaoqi

机构信息

Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia.

Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, QLD, 4222, Australia.

出版信息

Bioinformatics. 2021 Sep 9;37(17):2589-2600. doi: 10.1093/bioinformatics/btab165.

DOI:10.1093/bioinformatics/btab165

PMID:33704363

Abstract

MOTIVATION

The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling.

RESULTS

The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, non-canonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences.

AVAILABILITY AND IMPLEMENTATION

Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

最近大量非编码RNA（尤其是长非编码RNA）的发现改变了我们对RNA在生物体中作用的认知。然而，现有实验技术无法高效地高分辨率解析它们的二级和三级结构，这阻碍了我们对其的理解。另一方面，RNA二级结构的计算预测最近通过对大量近似数据的深度学习以及利用来自高分辨率三维结构的金标准碱基配对结构进行迁移学习得到了急需的改进。在此，我们将这种基于单序列的学习扩展到使用进化谱和突变耦合。

结果

新方法不仅在经典碱基对（RNA二级结构）方面有大幅改进，在与三级相互作用相关的碱基配对方面更是如此，比如假结、非经典碱基对和孤立碱基对。特别是，对于那些有超过1000个同源序列的RNA，该方法高度准确，在测试的16个RNA中有14个达到了>0.8的F1分数（灵敏度和精确率的调和均值）。该方法还可以通过纳入从深度突变扫描生成的人工但有功能的同源序列而无需任何修改来显著改善碱基配对预测。这个全自动方法（作为服务器和独立软件公开可用）应该为科学界提供一个新的强大工具，不仅可以捕捉二级结构，还能捕捉三级碱基配对信息以构建三维模型。它还凸显了通过使用大量天然和/或人工同源序列准确解析碱基配对结构的未来发展方向。