School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
Proteomics. 2022 Jan;22(1-2):e2100161. doi: 10.1002/pmic.202100161. Epub 2021 Oct 14.
Plant resistance (R) proteins play a significant role in the detection of pathogen invasion. Accurately predicting plant R proteins is a key task in phytopathology. Most plant R protein predictors are dependent on traditional feature extraction methods. Recently, deep representation learning methods have been successfully applied in solving protein classification problems. Motivated by this, we propose a new computational approach, called prPred-DRLF, which uses deep representation learning feature models to encode the amino acids as numerical vectors. The results show that the fused features of bidirectional long short-term memory (BiLSTM) embedding and unified representation (UniRep) embedding have a better performance than other features for plant R protein identification using a light gradient boosting machine (LGBM) classifier. The model was evaluated using an independent test achieving an accuracy of 0.956, F1-score of 0.933, and area under the receiver operating characteristic (ROC) curve (AUC) of 0.997. Meanwhile, compared with the state-of-the-art prPred and HMMER method, prPred-DRLF shows an overall improvement in accuracy, F1-score, AUC, and recall. prPred-DRLF is a higher-performance plant R protein prediction tool based on two kinds of deep representation learning technologies and offers a user-friendly interface for inspecting possible plant R proteins. We hope that prPred-DRLF will become a useful tool for biological research. A user-friendly webserver for prPred-DRLF is freely accessible at http://lab.malab.cn/soft/prPred-DRLF. The Python script can be downloaded from https://github.com/Wangys-prog/prPred-DRLF.
植物抗性 (R) 蛋白在病原体入侵检测中发挥着重要作用。准确预测植物 R 蛋白是植物病理学的关键任务。大多数植物 R 蛋白预测器依赖于传统的特征提取方法。最近,深度表示学习方法已成功应用于解决蛋白质分类问题。受此启发,我们提出了一种新的计算方法,称为 prPred-DRLF,它使用深度表示学习特征模型将氨基酸编码为数字向量。结果表明,使用轻梯度提升机 (LGBM) 分类器对植物 R 蛋白进行识别时,双向长短期记忆 (BiLSTM) 嵌入和统一表示 (UniRep) 嵌入的融合特征比其他特征具有更好的性能。该模型使用独立测试进行评估,准确率为 0.956,F1 得分为 0.933,接收器操作特征 (ROC) 曲线下的面积 (AUC) 为 0.997。同时,与最先进的 prPred 和 HMMER 方法相比,prPred-DRLF 在准确率、F1 得分、AUC 和召回率方面均有整体提高。prPred-DRLF 是一种基于两种深度表示学习技术的高性能植物 R 蛋白预测工具,为检查可能的植物 R 蛋白提供了用户友好的界面。我们希望 prPred-DRLF 将成为生物研究的有用工具。prPred-DRLF 的用户友好型网络服务器可在 http://lab.malab.cn/soft/prPred-DRLF 上免费访问。Python 脚本可从 https://github.com/Wangys-prog/prPred-DRLF 下载。