School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, Liaoning, China.
Department of Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000-00200, Kenya.
Cells. 2019 May 30;8(6):521. doi: 10.3390/cells8060521.
Long non-protein-coding RNAs (lncRNAs) identification and analysis are pervasive in transcriptome studies due to their roles in biological processes. In particular, lncRNA-protein interaction has plausible relevance to gene expression regulation and in cellular processes such as pathogen resistance in plants. While lncRNA-protein interaction has been studied in animals, there has yet to be extensive research in plants. In this paper, we propose a novel plant lncRNA-protein interaction prediction method, namely PLRPIM, which combines deep learning and shallow machine learning methods. The selection of an optimal feature subset and subsequent efficient compression are significant challenges for deep learning models. The proposed method adopts -mer and extracts high-level abstraction sequence-based features using stacked sparse autoencoder. Based on the extracted features, the fusion of random forest (RF) and light gradient boosting machine (LGBM) is used to build the prediction model. The performances are evaluated on and datasets. Results from experiments demonstrate PLRPIM's superiority compared with other prediction tools on the two datasets. Based on 5-fold cross-validation, we obtain 89.98% and 93.44% accuracy, 0.954 and 0.982 AUC for and respectively. PLRPIM predicts potential lncRNA-protein interaction pairs effectively, which can facilitate lncRNA related research including function prediction.
长非蛋白编码 RNA(lncRNA)的鉴定和分析在转录组研究中非常普遍,因为它们在生物过程中发挥着重要作用。特别是,lncRNA-蛋白质相互作用与基因表达调控以及细胞过程(如植物的病原体抗性)有合理的相关性。虽然在动物中已经研究了 lncRNA-蛋白质相互作用,但在植物中还没有广泛的研究。在本文中,我们提出了一种新的植物 lncRNA-蛋白质相互作用预测方法,即 PLRPIM,它结合了深度学习和浅层机器学习方法。对于深度学习模型来说,选择最优的特征子集和随后的有效压缩是一个重大挑战。该方法采用 -mer,并使用堆叠稀疏自动编码器提取基于序列的高级抽象特征。基于提取的特征,融合随机森林(RF)和轻梯度提升机(LGBM)来构建预测模型。在 和 数据集上进行了性能评估。实验结果表明,PLRPIM 在两个数据集上的表现优于其他预测工具。基于 5 折交叉验证,我们分别获得了 89.98%和 93.44%的准确率,0.954 和 0.982 的 AUC 值。PLRPIM 可以有效地预测潜在的 lncRNA-蛋白质相互作用对,这有助于包括功能预测在内的 lncRNA 相关研究。