Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China.
School of Mathematical and Computational Sciences, Massey University, Palmerston North, 4472, New Zealand.
BMC Bioinformatics. 2022 Jun 29;23(1):257. doi: 10.1186/s12859-022-04798-5.
Addressing the laborious nature of traditional biological experiments by using an efficient computational approach to analyze RNA-binding proteins (RBPs) binding sites has always been a challenging task. RBPs play a vital role in post-transcriptional control. Identification of RBPs binding sites is a key step for the anatomy of the essential mechanism of gene regulation by controlling splicing, stability, localization and translation. Traditional methods for detecting RBPs binding sites are time-consuming and computationally-intensive. Recently, the computational method has been incorporated in researches of RBPs. Nevertheless, lots of them not only rely on the sequence data of RNA but also need additional data, for example the secondary structural data of RNA, to improve the performance of prediction, which needs the pre-work to prepare the learnable representation of structural data.
To reduce the dependency of those pre-work, in this paper, we introduce DeepPN, a deep parallel neural network that is constructed with a convolutional neural network (CNN) and graph convolutional network (GCN) for detecting RBPs binding sites. It includes a two-layer CNN and GCN in parallel to extract the hidden features, followed by a fully connected layer to make the prediction. DeepPN discriminates the RBP binding sites on learnable representation of RNA sequences, which only uses the sequence data without using other data, for example the secondary or tertiary structure data of RNA. DeepPN is evaluated on 24 datasets of RBPs binding sites with other state-of-the-art methods. The results show that the performance of DeepPN is comparable to the published methods.
The experimental results show that DeepPN can effectively capture potential hidden features in RBPs and use these features for effective prediction of binding sites.
通过使用高效的计算方法分析 RNA 结合蛋白 (RBP) 的结合位点,来解决传统生物学实验繁琐的问题一直是一项具有挑战性的任务。RBP 在转录后调控中起着至关重要的作用。识别 RBP 的结合位点是剖析基因调控基本机制的关键步骤,其可以控制剪接、稳定性、定位和翻译。传统的检测 RBP 结合位点的方法既耗时又计算密集。最近,计算方法已被纳入 RBP 的研究中。然而,许多方法不仅依赖于 RNA 的序列数据,还需要额外的数据,例如 RNA 的二级结构数据,以提高预测性能,这需要预先准备可学习的结构数据表示。
为了减少对这些前期工作的依赖,在本文中,我们引入了 DeepPN,这是一种深度并行神经网络,由卷积神经网络 (CNN) 和图卷积网络 (GCN) 构建,用于检测 RBP 的结合位点。它包括一个两层的 CNN 和 GCN 并行提取隐藏特征,然后是一个全连接层进行预测。DeepPN 基于 RNA 序列的可学习表示来区分 RBP 结合位点,该方法仅使用序列数据,而不使用其他数据,例如 RNA 的二级或三级结构数据。我们在 24 个 RBP 结合位点数据集上评估了 DeepPN,并与其他最先进的方法进行比较。结果表明,DeepPN 的性能可与已发表的方法相媲美。
实验结果表明,DeepPN 可以有效地捕捉 RBP 中的潜在隐藏特征,并利用这些特征进行有效的结合位点预测。