Graduate Group in Bioinformatics, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA.
Bioinformatics. 2010 Jul 15;26(14):1714-22. doi: 10.1093/bioinformatics/btq267. Epub 2010 May 26.
Granzyme B (GrB) and caspases cleave specific protein substrates to induce apoptosis in virally infected and neoplastic cells. While substrates for both types of proteases have been determined experimentally, there are many more yet to be discovered in humans and other metazoans. Here, we present a bioinformatics method based on support vector machine (SVM) learning that identifies sequence and structural features important for protease recognition of substrate peptides and then uses these features to predict novel substrates. Our approach can act as a convenient hypothesis generator, guiding future experiments by high-confidence identification of peptide-protein partners.
The method is benchmarked on the known substrates of both protease types, including our literature-curated GrB substrate set (GrBah). On these benchmark sets, the method outperforms a number of other methods that consider sequence only, predicting at a 0.87 true positive rate (TPR) and a 0.13 false positive rate (FPR) for caspase substrates, and a 0.79 TPR and a 0.21 FPR for GrB substrates. The method is then applied to approximately 25 000 proteins in the human proteome to generate a ranked list of predicted substrates of each protease type. Two of these predictions, AIF-1 and SMN1, were selected for further experimental analysis, and each was validated as a GrB substrate.
All predictions for both protease types are publically available at http://salilab.org/peptide. A web server is at the same site that allows a user to train new SVM models to make predictions for any protein that recognizes specific oligopeptide ligands.
颗粒酶 B (GrB) 和半胱天冬酶切割特定的蛋白质底物,以诱导病毒感染和肿瘤细胞凋亡。虽然已经通过实验确定了这两种类型的蛋白酶的底物,但在人类和其他后生动物中还有更多的底物有待发现。在这里,我们提出了一种基于支持向量机(SVM)学习的生物信息学方法,该方法可以识别对蛋白酶识别底物肽很重要的序列和结构特征,然后使用这些特征来预测新的底物。我们的方法可以作为一种方便的假设生成器,通过高度置信地识别肽-蛋白伙伴,指导未来的实验。
该方法在两种蛋白酶类型的已知底物上进行了基准测试,包括我们文献整理的 GrB 底物集(GrBah)。在这些基准集上,该方法优于许多仅考虑序列的其他方法,对半胱天冬酶底物的预测准确率为 0.87(TPR)和假阳性率(FPR)为 0.13,对 GrB 底物的预测准确率为 0.79(TPR)和假阳性率(FPR)为 0.21。然后,该方法被应用于人类蛋白质组中的大约 25000 种蛋白质,以生成两种蛋白酶类型的预测底物的排名列表。其中两个预测结果,AIF-1 和 SMN1,被选中进行进一步的实验分析,并且每个预测结果都被验证为 GrB 底物。
两种蛋白酶类型的所有预测结果均可在 http://salilab.org/peptide 上公开获得。一个网络服务器位于同一站点,允许用户训练新的 SVM 模型,以便对识别特定寡肽配体的任何蛋白质进行预测。