Han Yong, Zhang Shao-Wu
MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
Comput Struct Biotechnol J. 2023 Mar 17;21:2286-2295. doi: 10.1016/j.csbj.2023.03.027. eCollection 2023.
Identification of ncRNA-protein interactions (ncRPIs) through wet experiments is still time-consuming and highly-costly. Although several computational approaches have been developed to predict ncRPIs using the structure and sequence information of ncRNAs and proteins, the prediction accuracy needs to be improved, and the results lack interpretability. In this work, we proposed a novel computational method (called ncRPI-LGAT) to predict the ncRNA-Protein Interactions by transforming the link prediction (, subgraph classification) task into a node classification task in the line network, and introducing a Line Graph ATtention network framework. ncRPI-LGAT first extracts the ncRNA/protein attributes using node2vec, and then generates the local enclosing subgraph of a target ncRNA-protein pair with SEAL. Because using the pooling operations in local enclosing subgraphs to learn a fixed-size feature vector for representing ncRNAs/proteins will cause the information loss, ncRPI-LGAT converts the local enclosing subgraphs into their corresponding line graphs, in which the node corresponds to the edge (, ncRNA-protein pair) of the local enclosing subgraphs. Then, the attention mechanism-based graph neural network GATv2 is used on these line graphs to efficiently learn the embedding features of the target nodes (, ncRNA-protein pairs) by focusing on learning the significance of one ncRNA-protein pair to another ncRNA-protein pair. These embedding features of one ncRNA-protein pair obtained from multi-head attention are concatenated in series and then fed them into a fully connected network to predict ncRPIs. Compared with other state-of-the-art methods in the 5CV test, ncRPI-LGAT shows superior performance on three benchmark datasets, demonstrating the effectiveness of our ncRPI-LGAT method in predicting ncRNA-protein interactions.
通过湿实验鉴定非编码RNA-蛋白质相互作用(ncRPIs)仍然耗时且成本高昂。尽管已经开发了几种计算方法来利用非编码RNA和蛋白质的结构和序列信息预测ncRPIs,但预测准确性仍需提高,且结果缺乏可解释性。在这项工作中,我们提出了一种新颖的计算方法(称为ncRPI-LGAT)来预测非编码RNA-蛋白质相互作用,即将链接预测(即子图分类)任务转换为线网络中的节点分类任务,并引入线图注意力网络框架。ncRPI-LGAT首先使用node2vec提取非编码RNA/蛋白质属性,然后使用SEAL生成目标非编码RNA-蛋白质对的局部封闭子图。由于在局部封闭子图中使用池化操作来学习用于表示非编码RNA/蛋白质的固定大小特征向量会导致信息丢失,ncRPI-LGAT将局部封闭子图转换为其相应的线图,其中节点对应于局部封闭子图的边(即非编码RNA-蛋白质对)。然后,在这些线图上使用基于注意力机制的图神经网络GATv2,通过专注于学习一个非编码RNA-蛋白质对相对于另一个非编码RNA-蛋白质对的重要性,来有效地学习目标节点(即非编码RNA-蛋白质对)的嵌入特征。从多头注意力获得的一个非编码RNA-蛋白质对的这些嵌入特征被串联起来,然后输入到一个全连接网络中以预测ncRPIs。在5折交叉验证测试中,与其他最先进的方法相比,ncRPI-LGAT在三个基准数据集上表现出卓越的性能,证明了我们的ncRPI-LGAT方法在预测非编码RNA-蛋白质相互作用方面的有效性。