Shao Chen, Sun Wei, Li Fuxin, Yang Ruifeng, Zhang Ling, Gao Youhe
Department of Physiology and Pathophysiology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, China.
J Mass Spectrom. 2009 Jan;44(1):25-31. doi: 10.1002/jms.1466.
Tandem mass spectrometry (MS/MS) has been widely used in proteomics studies. Multiple algorithms have been developed for assessing matches between MS/MS spectra and peptide sequences in databases. However, it is still a challenge to reduce false negative rates without compromising the high confidence of peptide identification. In this study, we developed the score, Oscore, by logistic regression using SEQUEST and AMASS variables to identify fully tryptic peptides. Since these variables showed complicated association with each other, combining them together rather than applying them to a threshold model improved the classification of correct and incorrect peptide identifications. Oscore achieved both a lower false negative rate and a lower false positive rate than PeptideProphet on datasets from 18 known protein mixtures and several proteome-scale samples of different complexity, database size and separation methods. By a three-way comparison among Oscore, PeptideProphet and another logistic regression model which made use of PeptideProphet's variables, the main contributor for the improvement made by Oscore is discussed.
串联质谱(MS/MS)已广泛应用于蛋白质组学研究。已经开发了多种算法来评估MS/MS谱与数据库中肽序列之间的匹配。然而,在不影响肽鉴定高置信度的情况下降低假阴性率仍然是一个挑战。在本研究中,我们通过逻辑回归使用SEQUEST和AMASS变量开发了分数Oscore,以鉴定完全酶解的肽段。由于这些变量之间显示出复杂的关联,将它们组合在一起而不是应用于阈值模型改善了正确和错误肽鉴定的分类。在来自18种已知蛋白质混合物以及几个不同复杂度、数据库大小和分离方法的蛋白质组规模样本的数据集上,Oscore的假阴性率和假阳性率均低于PeptideProphet。通过对Oscore、PeptideProphet和另一个使用PeptideProphet变量的逻辑回归模型进行三方比较,讨论了Oscore改进的主要贡献因素。