Suppr超能文献

prPred-DRLF:基于深度表示学习特征的植物 R 蛋白预测器。

prPred-DRLF: Plant R protein predictor using deep representation learning features.

机构信息

School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

Proteomics. 2022 Jan;22(1-2):e2100161. doi: 10.1002/pmic.202100161. Epub 2021 Oct 14.

Abstract

Plant resistance (R) proteins play a significant role in the detection of pathogen invasion. Accurately predicting plant R proteins is a key task in phytopathology. Most plant R protein predictors are dependent on traditional feature extraction methods. Recently, deep representation learning methods have been successfully applied in solving protein classification problems. Motivated by this, we propose a new computational approach, called prPred-DRLF, which uses deep representation learning feature models to encode the amino acids as numerical vectors. The results show that the fused features of bidirectional long short-term memory (BiLSTM) embedding and unified representation (UniRep) embedding have a better performance than other features for plant R protein identification using a light gradient boosting machine (LGBM) classifier. The model was evaluated using an independent test achieving an accuracy of 0.956, F1-score of 0.933, and area under the receiver operating characteristic (ROC) curve (AUC) of 0.997. Meanwhile, compared with the state-of-the-art prPred and HMMER method, prPred-DRLF shows an overall improvement in accuracy, F1-score, AUC, and recall. prPred-DRLF is a higher-performance plant R protein prediction tool based on two kinds of deep representation learning technologies and offers a user-friendly interface for inspecting possible plant R proteins. We hope that prPred-DRLF will become a useful tool for biological research. A user-friendly webserver for prPred-DRLF is freely accessible at http://lab.malab.cn/soft/prPred-DRLF. The Python script can be downloaded from https://github.com/Wangys-prog/prPred-DRLF.

摘要

植物抗性 (R) 蛋白在病原体入侵检测中发挥着重要作用。准确预测植物 R 蛋白是植物病理学的关键任务。大多数植物 R 蛋白预测器依赖于传统的特征提取方法。最近,深度表示学习方法已成功应用于解决蛋白质分类问题。受此启发,我们提出了一种新的计算方法,称为 prPred-DRLF,它使用深度表示学习特征模型将氨基酸编码为数字向量。结果表明,使用轻梯度提升机 (LGBM) 分类器对植物 R 蛋白进行识别时,双向长短期记忆 (BiLSTM) 嵌入和统一表示 (UniRep) 嵌入的融合特征比其他特征具有更好的性能。该模型使用独立测试进行评估,准确率为 0.956,F1 得分为 0.933,接收器操作特征 (ROC) 曲线下的面积 (AUC) 为 0.997。同时,与最先进的 prPred 和 HMMER 方法相比,prPred-DRLF 在准确率、F1 得分、AUC 和召回率方面均有整体提高。prPred-DRLF 是一种基于两种深度表示学习技术的高性能植物 R 蛋白预测工具,为检查可能的植物 R 蛋白提供了用户友好的界面。我们希望 prPred-DRLF 将成为生物研究的有用工具。prPred-DRLF 的用户友好型网络服务器可在 http://lab.malab.cn/soft/prPred-DRLF 上免费访问。Python 脚本可从 https://github.com/Wangys-prog/prPred-DRLF 下载。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验