Department of Computer Science and Engineering, University of California, Riverside, CA, 92521, USA.
College of Information Science and Engineering, Hunan Normal University, Changsha, China.
BMC Bioinformatics. 2021 Jan 18;22(1):24. doi: 10.1186/s12859-020-03914-7.
Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA.
In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions.
Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
长链非编码 RNA(lncRNA)通过与蛋白质相互作用来调节多种生物学过程。由于鉴定这些相互作用的实验方法既昂贵又耗时,因此提出了许多计算方法。尽管这些计算方法取得了有希望的预测性能,但它们忽略了一个基因可能编码多个蛋白质异构体,并且同一基因的不同异构体可能与相同的 lncRNA 以不同的方式相互作用。
在这项研究中,我们提出了一种新的方法 DeepLPI,用于预测 lncRNA 和蛋白质异构体之间的相互作用。我们的方法使用序列和结构数据提取内在特征,使用表达数据提取拓扑特征。为了结合这些不同的数据,我们采用了一种混合框架,将多模态深度学习神经网络和条件随机场集成在一起。为了克服 lncRNA 和蛋白质异构体之间已知相互作用的缺乏,我们应用了一种多实例学习(MIL)方法。在我们针对 NPInter v3.0 数据库中的人类 lncRNA-蛋白质相互作用的实验中,DeepLPI 在 AUC 方面提高了 4.7%的预测性能,在 AUPRC 方面提高了 5.9%的预测性能,优于最先进的方法。我们在交互式 lncRNA 和蛋白质异构体之间的进一步相关性分析也表明,它们的共表达信息有助于预测相互作用。最后,我们给出了一些例子,其中 DeepLPI 在预测小鼠 lncRNA-蛋白质相互作用和新的人类 lncRNA-蛋白质相互作用方面能够优于其他方法。
我们的结果表明,使用异构体和 MIL 显著有助于提高预测 lncRNA 和蛋白质相互作用的性能。我们相信,这种方法将在预测 RNA 和蛋白质的其他功能作用方面找到更多的应用。