Suppr超能文献

LPI-CNNCP:利用卷积神经网络和复制填充技术预测 lncRNA-蛋白质相互作用。

LPI-CNNCP: Prediction of lncRNA-protein interactions by using convolutional neural network with the copy-padding trick.

机构信息

Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.

Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China.

出版信息

Anal Biochem. 2020 Jul 15;601:113767. doi: 10.1016/j.ab.2020.113767. Epub 2020 May 23.

Abstract

Long noncoding RNAs (lncRNAs) play critical roles in many pathological and biological processes, such as post-transcription, cell differentiation and gene regulation. Increasingly more studies have shown that lncRNAs function through mainly interactions with specific RNA binding proteins (RBPs). However, experimental identification of potential lncRNA-protein interactions is costly and time-consuming. In this work, we propose a novel convolutional neural network-based method with the copy-padding trick (named LPI-CNNCP) to predict lncRNA-protein interactions. The copy-padding trick of the LPI-CNNCP convert the protein/RNA sequences with variable-length into the fixed-length sequences, thus enabling the construction of the CNN model. A high-order one-hot encoding is also applied to transform the protein/RNA sequences into image-like inputs for capturing the dependencies among amino acids (or nucleotides). In the end, these encoded protein/RNA sequences are feed into a CNN to predict the lncRNA-protein interactions. Compared with other state-of-the-art methods in 10-fold cross-validation (10CV) test, LPI-CNNCP shows the best performance. Results in the independent test demonstrate that our LPI-CNNCP can effectively predict the potential lncRNA-protein interactions. We also compared the copy-padding trick with two other existing tricks (i.e., zero-padding and cropping), and the results show that our copy-padding rick outperforms the zero-padding and cropping tricks on predicting lncRNA-protein interactions. The source code of LPI-CNNCP and the datasets used in this work are available at https://github.com/NWPU-903PR/LPI-CNNCP for academic users.

摘要

长链非编码 RNA(lncRNA)在许多病理和生物过程中发挥着关键作用,例如转录后、细胞分化和基因调控。越来越多的研究表明,lncRNA 通过与特定的 RNA 结合蛋白(RBP)的主要相互作用来发挥作用。然而,实验鉴定潜在的 lncRNA-蛋白质相互作用是昂贵且耗时的。在这项工作中,我们提出了一种新的基于卷积神经网络的方法,该方法具有复制填充技巧(命名为 LPI-CNNCP),用于预测 lncRNA-蛋白质相互作用。LPI-CNNCP 的复制填充技巧将具有可变长度的蛋白质/RNA 序列转换为固定长度的序列,从而能够构建 CNN 模型。还应用了高阶 one-hot 编码将蛋白质/RNA 序列转换为图像样输入,以捕获氨基酸(或核苷酸)之间的依赖关系。最后,将这些编码的蛋白质/RNA 序列输入到 CNN 中以预测 lncRNA-蛋白质相互作用。在 10 倍交叉验证(10CV)测试中,与其他最先进的方法相比,LPI-CNNCP 表现出最佳性能。独立测试的结果表明,我们的 LPI-CNNCP 可以有效地预测潜在的 lncRNA-蛋白质相互作用。我们还将复制填充技巧与另外两种现有技巧(即零填充和裁剪)进行了比较,结果表明,在预测 lncRNA-蛋白质相互作用方面,我们的复制填充技巧优于零填充和裁剪技巧。LPI-CNNCP 的源代码和本工作中使用的数据集可在 https://github.com/NWPU-903PR/LPI-CNNCP 上获得,供学术用户使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验