Frank Ari M
Department of Computer Science and Engineering, University of California, San Diego (UCSD), 9500 Gilman Drive, La Jolla, California 92093-0404, USA.
J Proteome Res. 2009 May;8(5):2226-40. doi: 10.1021/pr800677f.
Accurate modeling of peptide fragmentation is necessary for the development of robust scoring functions for peptide-spectrum matches, which are the cornerstone of MS/MS-based identification algorithms. Unfortunately, peptide fragmentation is a complex process that can involve several competing chemical pathways, which makes it difficult to develop generative probabilistic models that describe it accurately. However, the vast amounts of MS/MS data being generated now make it possible to use data-driven machine learning methods to develop discriminative ranking-based models that predict the intensity ranks of a peptide's fragment ions. We use simple sequence-based features that get combined by a boosting algorithm into models that make peak rank predictions with high accuracy. In an accompanying manuscript, we demonstrate how these prediction models are used to significantly improve the performance of peptide identification algorithms. The models can also be useful in the design of optimal multiple reaction monitoring (MRM) transitions, in cases where there is insufficient experimental data to guide the peak selection process. The prediction algorithm can also be run independently through PepNovo+, which is available for download from http://bix.ucsd.edu/Software/PepNovo.html.
对于肽段-质谱匹配的稳健评分函数的开发而言,肽段碎裂的精确建模是必要的,而肽段-质谱匹配是基于二级质谱的鉴定算法的基石。不幸的是,肽段碎裂是一个复杂的过程,可能涉及多种相互竞争的化学途径,这使得开发能够准确描述它的生成概率模型变得困难。然而,目前产生的大量二级质谱数据使得使用数据驱动的机器学习方法来开发基于判别式排序的模型成为可能,这些模型可以预测肽段碎片离子的强度排名。我们使用基于简单序列的特征,通过一种提升算法将这些特征组合成能够高精度预测峰排名的模型。在一篇配套论文中,我们展示了如何使用这些预测模型来显著提高肽段鉴定算法的性能。在没有足够的实验数据来指导峰选择过程的情况下,这些模型在优化多反应监测(MRM)跃迁的设计中也可能有用。预测算法也可以通过PepNovo+独立运行,PepNovo+可从http://bix.ucsd.edu/Software/PepNovo.html下载。