IEEE J Biomed Health Inform. 2024 Apr;28(4):1937-1948. doi: 10.1109/JBHI.2023.3286917. Epub 2024 Apr 4.
The complexes of long non-coding RNAs bound to proteins can be involved in regulating life activities at various stages of organisms. However, in the face of the growing number of lncRNAs and proteins, verifying LncRNA-Protein Interactions (LPI) based on traditional biological experiments is time-consuming and laborious. Therefore, with the improvement of computing power, predicting LPI has met new development opportunity. In virtue of the state-of-the-art works, a framework called LncRNA-Protein Interactions based on Kernel Combinations and Graph Convolutional Networks (LPI-KCGCN) has been proposed in this article. We first construct kernel matrices by taking advantage of extracting both the lncRNAs and protein concerning the sequence features, sequence similarity features, expression features, and gene ontology. Then reconstruct the existent kernel matrices as the input of the next step. Combined with known LPI interactions, the reconstructed similarity matrices, which can be used as features of the topology map of the LPI network, are exploited in extracting potential representations in the lncRNA and protein space using a two-layer Graph Convolutional Network. The predicted matrix can be finally obtained by training the network to produce scoring matrices w.r.t. lncRNAs and proteins. Different LPI-KCGCN variants are ensemble to derive the final prediction results and testify on balanced and unbalanced datasets. The 5-fold cross-validation shows that the optimal feature information combination on a dataset with 15.5% positive samples has an AUC value of 0.9714 and an AUPR value of 0.9216. On another highly unbalanced dataset with only 5% positive samples, LPI-KCGCN also has outperformed the state-of-the-art works, which achieved an AUC value of 0.9907 and an AUPR value of 0.9267.
长链非编码 RNA 与蛋白质结合的复合物可以参与调节生物体各个阶段的生命活动。然而,面对越来越多的 lncRNA 和蛋白质,基于传统生物学实验验证 LncRNA-Protein 相互作用(LPI)既费时又费力。因此,随着计算能力的提高,预测 LPI 迎来了新的发展机遇。本文利用最新研究成果,提出了一种基于核组合和图卷积网络的 LncRNA-Protein 相互作用预测框架(LPI-KCGCN)。我们首先利用提取的 lncRNA 和蛋白质的序列特征、序列相似性特征、表达特征和基因本体,构建核矩阵。然后将现有的核矩阵重构为下一步的输入。结合已知的 LPI 相互作用,利用重构的相似矩阵作为 LPI 网络拓扑图的特征,在 lncRNA 和蛋白质空间中利用两层图卷积网络提取潜在表示。最后通过训练网络生成 lncRNA 和蛋白质的评分矩阵来获得预测矩阵。不同的 LPI-KCGCN 变体被集成以获得最终的预测结果,并在平衡和不平衡数据集上进行验证。5 折交叉验证结果表明,在一个正样本率为 15.5%的数据集上,最优特征信息组合的 AUC 值为 0.9714,AUPR 值为 0.9216。在另一个仅有 5%正样本的高度不平衡数据集上,LPI-KCGCN 也优于最新技术,AUC 值为 0.9907,AUPR 值为 0.9267。