Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, USA.
Bioinformatics. 2010 Jul 1;26(13):1677-83. doi: 10.1093/bioinformatics/btq251.
The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic).
We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of approximately 0.83 with an SD of <0.038. Furthermore, we demonstrate that these results are achievable with a small set of 13 variables and can achieve high proteome coverage.
http://omics.pnl.gov/software/STEPP.php
Supplementary data are available at Bioinformatics online.
基于精确质量和洗脱时间 (AMT) 识别肽的标准方法将从高分辨率质谱仪获得的图谱与先前通过串联质谱 (MS/MS) 研究从肽鉴定的数据库进行比较。仅搜索可通过 MS 检测到的肽(蛋白质型)在准确性和成本方面都具有优势。
我们提出了一种支持向量机 (SVM) 模型,该模型使用基于氨基酸含量、电荷、亲水性和极性的 35 种特性的简单描述符空间,对蛋白质型肽进行定量预测。使用三个独立衍生的 AMT 数据库(Shewanella oneidensis、Salmonella typhimurium、Yersinia pestis)在物种内和跨物种进行训练和验证,SVM 的平均准确度测量值约为 0.83,标准偏差 <0.038。此外,我们证明这些结果可以通过一组 13 个变量实现,并且可以实现高蛋白质组覆盖率。
http://omics.pnl.gov/software/STEPP.php
补充数据可在 Bioinformatics 在线获得。