Suppr超能文献

基于扩张卷积神经网络的 RNA 溶剂可及性的单序列和轮廓预测。

Single-sequence and profile-based prediction of RNA solvent accessibility using dilated convolutional neural network.

机构信息

Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia.

Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia.

出版信息

Bioinformatics. 2021 Jan 29;36(21):5169-5176. doi: 10.1093/bioinformatics/btaa652.

Abstract

MOTIVATION

RNA solvent accessibility, similar to protein solvent accessibility, reflects the structural regions that are accessible to solvents or other functional biomolecules, and plays an important role for structural and functional characterization. Unlike protein solvent accessibility, only a few tools are available for predicting RNA solvent accessibility despite the fact that millions of RNA transcripts have unknown structures and functions. Also, these tools have limited accuracy. Here, we have developed RNAsnap2 that uses a dilated convolutional neural network with a new feature, based on predicted base-pairing probabilities from LinearPartition.

RESULTS

Using the same training set from the recent predictor RNAsol, RNAsnap2 provides an 11% improvement in median Pearson Correlation Coefficient (PCC) and 9% improvement in mean absolute errors for the same test set of 45 RNA chains. A larger improvement (22% in median PCC) is observed for 31 newly deposited RNA chains that are non-redundant and independent from the training and the test sets. A single-sequence version of RNAsnap2 (i.e. without using sequence profiles generated from homology search by Infernal) has achieved comparable performance to the profile-based RNAsol. In addition, RNAsnap2 has achieved comparable performance for protein-bound and protein-free RNAs. Both RNAsnap2 and RNAsnap2 (SingleSeq) are expected to be useful for searching structural signatures and locating functional regions of non-coding RNAs.

AVAILABILITY AND IMPLEMENTATION

Standalone-versions of RNAsnap2 and RNAsnap2 (SingleSeq) are available at https://github.com/jaswindersingh2/RNAsnap2. Direct prediction can also be made at https://sparks-lab.org/server/rnasnap2. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

与蛋白质溶剂可及性类似,RNA 溶剂可及性反映了结构区域,这些结构区域对溶剂或其他功能性生物分子是可及的,并且对结构和功能特征起着重要作用。尽管有数百万个 RNA 转录本的结构和功能未知,但与预测蛋白质溶剂可及性的工具相比,可用的工具却很少。此外,这些工具的准确性有限。在这里,我们开发了 RNAsnap2,它使用了一种带有新特征的扩张卷积神经网络,该特征基于来自 LinearPartition 的预测碱基对概率。

结果

使用最近的预测器 RNAsol 的相同训练集,RNAsnap2 为相同的 45 个 RNA 链测试集提供了 11%的中值 Pearson 相关系数(PCC)改进和 9%的平均绝对误差改进。对于 31 个新存入的 RNA 链,观察到更大的改进(中值 PCC 提高 22%),这些 RNA 链与训练集和测试集均不重复且独立。不使用 Infernal 通过同源搜索生成的序列图谱的 RNAsnap2 的单序列版本(即不使用)已经实现了与基于图谱的 RNAsol 相当的性能。此外,RNAsnap2 对结合蛋白和非结合蛋白的 RNA 也具有相当的性能。RNAsnap2 和 RNAsnap2(SingleSeq)都有望用于搜索非编码 RNA 的结构特征和定位功能区域。

可用性和实现

RNAsnap2 和 RNAsnap2(SingleSeq)的独立版本可在 https://github.com/jaswindersingh2/RNAsnap2 上获得。也可以在 https://sparks-lab.org/server/rnasnap2 直接进行预测。本研究中使用的数据集也可以从上述 GITHUB 和网络服务器下载。

补充信息

补充数据可在 Bioinformatics 在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验