School of Computer Science and Technology, Dalian University of Technology, Dalian, 116023, Liaoning, China.
School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, 62000-00200, Kenya.
Mol Genet Genomics. 2020 Sep;295(5):1091-1102. doi: 10.1007/s00438-020-01682-w. Epub 2020 May 15.
Long non-coding RNAs (lncRNAs) play a broad spectrum of distinctive regulatory roles through interactions with proteins. However, only a few plant lncRNAs have been experimentally characterized. We propose GPLPI, a graph representation learning method, to predict plant lncRNA-protein interaction (LPI) from sequence and structural information. GPLPI employs a generative model using long short-term memory (LSTM) with graph attention. Evolutionary features are extracted using frequency chaos game representation (FCGR). Manifold regularization and l-norm are adopted to obtain discriminant feature representations and mitigate overfitting. The model captures locality preserving and reconstruction constraints that lead to better generalization ability. Finally, potential interactions between lncRNAs and proteins are predicted by integrating catboost and regularized Logistic regression based on L-BFGS optimization algorithm. The method is trained and tested on Arabidopsis thaliana and Zea mays datasets. GPLPI achieves accuracies of 85.76% and 91.97% respectively. The results show that our method consistently outperforms other state-of-the-art methods.
长非编码 RNA(lncRNA)通过与蛋白质相互作用发挥广泛而独特的调控作用。然而,只有少数植物 lncRNA 得到了实验验证。我们提出了 GPLPI,这是一种基于图表示学习的方法,用于从序列和结构信息预测植物 lncRNA-蛋白质相互作用(LPI)。GPLPI 使用长短期记忆(LSTM)和图注意力的生成模型。使用频率混沌游戏表示(FCGR)提取进化特征。采用流形正则化和 l-范数获得判别特征表示,减轻过拟合。该模型捕捉到局部保持和重建约束,从而提高了泛化能力。最后,通过基于 L-BFGS 优化算法的 catboost 和正则化逻辑回归,整合预测 lncRNA 和蛋白质之间的潜在相互作用。该方法在拟南芥和玉米数据集上进行训练和测试。GPLPI 的准确率分别为 85.76%和 91.97%。结果表明,我们的方法始终优于其他最先进的方法。