Zhang Shao-Wu, Wang Ya, Zhang Xi-Xi, Wang Jia-Qi
Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China.
Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, China.
Anal Biochem. 2019 Oct 15;583:113364. doi: 10.1016/j.ab.2019.113364. Epub 2019 Jul 16.
Long non-coding RNA (lncRNA) plays an important role in cells through the interaction with RNA-binding proteins (RBPs). Finding the RBPs binding sites on the lncRNA chains can help to understand the post-transcriptional regulatory mechanism, exploring the pathogenesis of cancers and possible roles in other diseases. Although many genome-wide RBP experimental techniques can identify the RNA-protein interactions and detect the binding sites on RNA chains, they are still time-consuming, labor-intensive and cost-heavy. Thus, many computational methods have been developed to predict the RBPs sites by integrating the RNA sequence, structure and domain specific features, etc. However, current approaches that focus on predicting the RBPs binding sites on RNA chains lack a consideration of the dependencies among nucleotides. In this work, we propose a higher-order nucleotide encoding convolutional neural network-based method (namely HOCNNLB) to predict the RBPs binding sites on lncRNA chains. HOCNNLB first employs a high-order one-hot encoding strategy to encode the lncRNA sequences by considering the dependence among nucleotides, then the encoded lncRNA sequences are fed into the convolutional neural network (CNN) to predict the RBP binding sites. We evaluate HOCNNLB on 31 experimental datasets of 12 lncRNA binding proteins. The average AUC of HOCNNLB achieves 0.953, which is 0.247, 0.175 higher than that of iDeepS and DeepBind, respectively. The average accuracy is 90.2%, which is 26.8%, 19.5% higher than that of iDeepS and DeepBind, respectively. These results demonstrate that HOCNNLB can reliably predict the RBP binding sites on lncRNA chains and outperforms the state-of-the-art methods. The source code of HOCNNLB and the datasets used in this work are available at https://github.com/NWPU-903PR/HOCNNLB for academic users.
长链非编码RNA(lncRNA)通过与RNA结合蛋白(RBP)相互作用在细胞中发挥重要作用。找到lncRNA链上的RBP结合位点有助于理解转录后调控机制,探索癌症的发病机制以及在其他疾病中的可能作用。尽管许多全基因组RBP实验技术能够识别RNA-蛋白质相互作用并检测RNA链上的结合位点,但这些技术仍然耗时、费力且成本高昂。因此,人们开发了许多计算方法,通过整合RNA序列、结构和结构域特定特征等来预测RBP位点。然而,目前专注于预测RNA链上RBP结合位点的方法没有考虑核苷酸之间的依赖性。在这项工作中,我们提出了一种基于高阶核苷酸编码卷积神经网络的方法(即HOCNNLB)来预测lncRNA链上的RBP结合位点。HOCNNLB首先采用高阶独热编码策略,通过考虑核苷酸之间的依赖性对lncRNA序列进行编码,然后将编码后的lncRNA序列输入卷积神经网络(CNN)以预测RBP结合位点。我们在12种lncRNA结合蛋白的31个实验数据集上对HOCNNLB进行了评估。HOCNNLB的平均AUC达到0.953,分别比iDeepS和DeepBind高0.247和0.175。平均准确率为90.2%,分别比iDeepS和DeepBind高26.8%和19.5%。这些结果表明,HOCNNLB能够可靠地预测lncRNA链上的RBP结合位点,并且优于现有方法。HOCNNLB的源代码以及本工作中使用的数据集可供学术用户从https://github.com/NWPU-903PR/HOCNNLB获取。