挖掘串联质谱数据以开发更准确的质量误差模型用于肽段鉴定。

Mining tandem mass spectral data to develop a more accurate mass error model for peptide identification.

作者信息

Fu Yan, Gao Wen, He Simin, Sun Ruixiang, Zhou Hu, Zeng Rong

机构信息

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.

出版信息

Pac Symp Biocomput. 2007:421-32.

PMID:17990507

Abstract

The assumption on the mass error distribution of fragment ions plays a crucial role in peptide identification by tandem mass spectra. Previous mass error models are the simplistic uniform or normal distribution with empirically set parameter values. In this paper, we propose a more accurate mass error model, namely conditional normal model, and an iterative parameter learning algorithm. The new model is based on two important observations on the mass error distribution, i.e. the linearity between the mean of mass error and the ion mass, and the log-log linearity between the standard deviation of mass error and the peak intensity. To our knowledge, the latter quantitative relationship has never been reported before. Experimental results demonstrate the effectiveness of our approach in accurately quantifying the mass error distribution and the ability of the new model to improve the accuracy of peptide identification.

摘要

关于碎片离子质量误差分布的假设在串联质谱肽段鉴定中起着关键作用。以往的质量误差模型是具有经验设定参数值的简单均匀分布或正态分布。在本文中，我们提出了一种更准确的质量误差模型，即条件正态模型，以及一种迭代参数学习算法。新模型基于对质量误差分布的两个重要观察结果，即质量误差均值与离子质量之间的线性关系，以及质量误差标准差与峰强度之间的对数-对数线性关系。据我们所知，后一种定量关系此前从未被报道过。实验结果证明了我们的方法在准确量化质量误差分布方面的有效性，以及新模型提高肽段鉴定准确性的能力。