ToxinPred2：一种改进的蛋白质毒性预测方法。

ToxinPred2: an improved method for predicting toxicity of proteins.

机构信息

Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India.

出版信息

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac174.

DOI:10.1093/bib/bbac174

PMID:35595541

Abstract

Proteins/peptides have shown to be promising therapeutic agents for a variety of diseases. However, toxicity is one of the obstacles in protein/peptide-based therapy. The current study describes a web-based tool, ToxinPred2, developed for predicting the toxicity of proteins. This is an update of ToxinPred developed mainly for predicting toxicity of peptides and small proteins. The method has been trained, tested and evaluated on three datasets curated from the recent release of the SwissProt. To provide unbiased evaluation, we performed internal validation on 80% of the data and external validation on the remaining 20% of data. We have implemented the following techniques for predicting protein toxicity; (i) Basic Local Alignment Search Tool-based similarity, (ii) Motif-EmeRging and with Classes-Identification-based motif search and (iii) Prediction models. Similarity and motif-based techniques achieved a high probability of correct prediction with poor sensitivity/coverage, whereas models based on machine-learning techniques achieved balance sensitivity and specificity with reasonably high accuracy. Finally, we developed a hybrid method that combined all three approaches and achieved a maximum area under receiver operating characteristic curve around 0.99 with Matthews correlation coefficient 0.91 on the validation dataset. In addition, we developed models on alternate and realistic datasets. The best machine learning models have been implemented in the web server named 'ToxinPred2', which is available at https://webs.iiitd.edu.in/raghava/toxinpred2/ and a standalone version at https://github.com/raghavagps/toxinpred2. This is a general method developed for predicting the toxicity of proteins regardless of their source of origin.

摘要

蛋白质/肽已被证明是多种疾病有前途的治疗剂。然而，毒性是基于蛋白质/肽的治疗的障碍之一。本研究描述了一种基于网络的工具，ToxinPred2，用于预测蛋白质的毒性。这是 ToxinPred 的更新版本，主要用于预测肽和小蛋白质的毒性。该方法已在从最近的 SwissProt 版本中整理的三个数据集上进行了训练、测试和评估。为了提供无偏评估，我们在 80%的数据上进行了内部验证，并在其余 20%的数据上进行了外部验证。我们为预测蛋白质毒性实施了以下技术：（i）基于基本局部比对搜索工具的相似性，（ii）基于模体出现和类识别的模体搜索和（iii）预测模型。相似性和基于模体的技术具有高正确预测概率，但敏感性/覆盖范围较差，而基于机器学习技术的模型则具有平衡的敏感性和特异性，并且准确性相当高。最后，我们开发了一种混合方法，该方法结合了所有三种方法，在验证数据集上实现了约 0.99 的最大接收器操作特征曲线下面积，马修斯相关系数为 0.91。此外，我们还在替代和现实数据集上开发了模型。最佳的机器学习模型已在名为“ToxinPred2”的网络服务器中实现，该服务器可在 https://webs.iiitd.edu.in/raghava/toxinpred2/ 获得，独立版本可在 https://github.com/raghavagps/toxinpred2 获得。这是一种通用方法，用于预测蛋白质的毒性，而与蛋白质的来源无关。