State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.
Department of Hematology, Peking University First Hospital, Beijing, 100034, China.
Plant J. 2023 May;114(4):984-994. doi: 10.1111/tpj.16188. Epub 2023 Mar 29.
Currently, the experimentally identified interactome of Arabidopsis (Arabidopsis thaliana) is still far from complete, suggesting that computational prediction methods can complement experimental techniques. Motivated by the prosperity and success of deep learning algorithms and natural language processing techniques, we introduce an integrative deep learning framework, DeepAraPPI, allowing us to predict protein-protein interactions (PPIs) of Arabidopsis utilizing sequence, domain and Gene Ontology (GO) information. Our current DeepAraPPI comprises: (i) a word2vec encoding-based Siamese recurrent convolutional neural network (RCNN) model; (ii) a Domain2vec encoding-based multiple-layer perceptron (MLP) model; and (iii) a GO2vec encoding-based MLP model. Finally, DeepAraPPI combines the prediction results of the three individual predictors through a logistic regression model. Compiling high-quality positive and negative training and test samples by applying strict filtering strategies, DeepAraPPI shows superior performance compared with existing state-of-the-art Arabidopsis PPI prediction methods. DeepAraPPI also provides better cross-species predictive ability in rice (Oryza sativa) than traditional machine learning methods, although the overall performance in cross-species prediction remains to be improved. DeepAraPPI is freely accessible at http://zzdlab.com/deeparappi/. In the meantime, we have also made the source code and data sets of DeepAraPPI available at https://github.com/zjy1125/DeepAraPPI.
目前,拟南芥(Arabidopsis thaliana)已鉴定的实验互作组仍然远远不够完整,这表明计算预测方法可以补充实验技术。受深度学习算法和自然语言处理技术的繁荣和成功的启发,我们引入了一个综合的深度学习框架 DeepAraPPI,该框架允许我们利用序列、结构域和基因本体(GO)信息来预测拟南芥的蛋白质-蛋白质相互作用(PPI)。我们目前的 DeepAraPPI 包括:(i)基于 word2vec 编码的孪生递归卷积神经网络(RCNN)模型;(ii)基于 Domain2vec 编码的多层感知机(MLP)模型;和(iii)基于 GO2vec 编码的 MLP 模型。最后,DeepAraPPI 通过逻辑回归模型组合三个独立预测器的预测结果。通过应用严格的过滤策略,编译高质量的正、负训练和测试样本,DeepAraPPI 显示出优于现有最先进的拟南芥 PPI 预测方法的性能。与传统的机器学习方法相比,DeepAraPPI 还为水稻(Oryza sativa)提供了更好的跨物种预测能力,尽管跨物种预测的整体性能仍有待提高。DeepAraPPI 可在 http://zzdlab.com/deeparappi/ 免费获取。同时,我们还在 https://github.com/zjy1125/DeepAraPPI 上提供了 DeepAraPPI 的源代码和数据集。