Dong Nai-ping, Liang Yi-Zeng, Xu Qing-song, Mok Daniel K W, Yi Lun-zhao, Lu Hong-mei, He Min, Fan Wei
College of Chemistry and Chemical Engineering and ‡School of Mathematics and Statistics, Central South University , Changsha, 410083, P. R. China.
Anal Chem. 2014 Aug 5;86(15):7446-54. doi: 10.1021/ac501094m. Epub 2014 Jul 25.
Accurate prediction of peptide fragment ion mass spectra is one of the critical factors to guarantee confident peptide identification by protein sequence database search in bottom-up proteomics. In an attempt to accurately and comprehensively predict this type of mass spectra, a framework named MS(2)PBPI is proposed. MS(2)PBPI first extracts fragment ions from large-scale MS/MS spectra data sets according to the peptide fragmentation pathways and uses binary trees to divide the obtained bulky data into tens to more than 1000 regions. For each adequate region, stochastic gradient boosting tree regression model is constructed. By constructing hundreds of these models, MS(2)PBPI is able to predict MS/MS spectra for unmodified and modified peptides with reasonable accuracy. Moreover, high consistency between predicted and experimental MS/MS spectra derived from different ion trap instruments with low and high resolving power is achieved. MS(2)PBPI outperforms existing algorithms MassAnalyzer and PeptideART.
准确预测肽段碎片离子质谱图是在自下而上的蛋白质组学中通过蛋白质序列数据库搜索确保可靠肽段鉴定的关键因素之一。为了准确、全面地预测此类质谱图,提出了一种名为MS(2)PBPI的框架。MS(2)PBPI首先根据肽段裂解途径从大规模MS/MS光谱数据集中提取碎片离子,并使用二叉树将获得的大量数据划分为数十个至1000多个区域。对于每个合适的区域,构建随机梯度提升树回归模型。通过构建数百个这样的模型,MS(2)PBPI能够以合理的准确度预测未修饰和修饰肽段的MS/MS光谱图。此外,在低分辨率和高分辨率的不同离子阱仪器获得的预测MS/MS光谱图与实验MS/MS光谱图之间实现了高度一致性。MS(2)PBPI优于现有算法MassAnalyzer和PeptideART。