Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China.
J Chem Inf Model. 2024 Apr 8;64(7):2221-2235. doi: 10.1021/acs.jcim.3c00377. Epub 2023 May 9.
Noncoding RNAs (ncRNAs) play crucial roles in many cellular life activities by interacting with proteins. Identification of ncRNA-protein interactions (ncRPIs) is key to understanding the function of ncRNAs. Although a number of computational methods for predicting ncRPIs have been developed, the problem of predicting ncRPIs remains challenging. It has always been the focus of ncRPIs research to select suitable feature extraction methods and develop a deep learning architecture with better recognition performance. In this work, we proposed an ensemble deep learning framework, RPI-EDLCN, based on a capsule network (CapsuleNet) to predict ncRPIs. In terms of feature input, we extracted the sequence features, secondary structure sequence features, motif information, and physicochemical properties of ncRNA/protein. The sequence and secondary structure sequence features of ncRNA/protein are encoded by the conjoint k-mer method and then input into an ensemble deep learning model based on CapsuleNet by combining the motif information and physicochemical properties. In this model, the encoding features are processed by convolution neural network (CNN), deep neural network (DNN), and stacked autoencoder (SAE). Then the advanced features obtained from the processing are input into the CapsuleNet for further feature learning. Compared with other state-of-the-art methods under 5-fold cross-validation, the performance of RPI-EDLCN is the best, and the accuracy of RPI-EDLCN on RPI1807, RPI2241, and NPInter v2.0 data sets was 93.8%, 88.2%, and 91.9%, respectively. The results of the independent test indicated that RPI-EDLCN can effectively predict potential ncRPIs in different organisms. In addition, RPI-EDLCN successfully predicted hub ncRNAs and proteins in ncRNA-protein networks. Overall, our model can be used as an effective tool to predict ncRPIs and provides some useful guidance for future biological studies.
非编码 RNA(ncRNAs)通过与蛋白质相互作用在许多细胞生命活动中发挥着关键作用。鉴定 ncRNA-蛋白质相互作用(ncRPIs)是理解 ncRNA 功能的关键。尽管已经开发了许多用于预测 ncRPIs 的计算方法,但预测 ncRPIs 的问题仍然具有挑战性。选择合适的特征提取方法并开发具有更好识别性能的深度学习架构一直是 ncRPIs 研究的重点。在这项工作中,我们提出了一种基于胶囊网络(CapsuleNet)的集成深度学习框架 RPI-EDLCN,用于预测 ncRPIs。在特征输入方面,我们提取了 ncRNA/蛋白质的序列特征、二级结构序列特征、基序信息和理化性质。ncRNA/蛋白质的序列和二级结构序列特征通过联合 k-mer 方法进行编码,然后结合基序信息和理化性质将其输入到基于 CapsuleNet 的集成深度学习模型中。在该模型中,编码特征由卷积神经网络(CNN)、深度神经网络(DNN)和堆叠自动编码器(SAE)处理。然后,从处理中获得的高级特征被输入到 CapsuleNet 中进行进一步的特征学习。与 5 倍交叉验证下的其他最先进方法相比,RPI-EDLCN 的性能最好,RPI-EDLCN 在 RPI1807、RPI2241 和 NPInter v2.0 数据集上的准确率分别为 93.8%、88.2%和 91.9%。独立测试的结果表明,RPI-EDLCN 可以有效地预测不同生物体中潜在的 ncRPIs。此外,RPI-EDLCN 成功预测了 ncRNA-蛋白质网络中的枢纽 ncRNA 和蛋白质。总的来说,我们的模型可以作为预测 ncRPIs 的有效工具,并为未来的生物学研究提供一些有用的指导。