Fu Yan, Jia Wei, Lu Zhuang, Wang Haipeng, Yuan Zuofei, Chi Hao, Li You, Xiu Liyun, Wang Wenping, Liu Chao, Wang Leheng, Sun Ruixiang, Gao Wen, Qian Xiaohong, He Si-Min
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, PR China.
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S50. doi: 10.1186/1471-2105-10-S1-S50.
Peptide identification via tandem mass spectrometry is the basic task of current proteomics research. Due to the complexity of mass spectra, the majority of mass spectra cannot be interpreted at present. The existence of unexpected or unknown protein post-translational modifications is a major reason.
This paper describes an efficient and sequence database-independent approach to detecting abundant post-translational modifications in high-accuracy peptide mass spectra. The approach is based on the observation that the spectra of a modified peptide and its unmodified counterpart are correlated with each other in their peptide masses and retention time. Frequently occurring peptide mass differences in a data set imply possible modifications, while small and consistent retention time differences provide orthogonal supporting evidence. We propose to use a bivariate Gaussian mixture model to discriminate modification-related spectral pairs from random ones. Due to the use of two-dimensional information, accurate modification masses and confident spectral pairs can be determined as well as the quantitative influences of modifications on peptide retention time.
Experiments on two glycoprotein data sets demonstrate that our method can effectively detect abundant modifications and spectral pairs. By including the discovered modifications into database search or by propagating peptide assignments between paired spectra, an average of 10% more spectra are interpreted.
通过串联质谱进行肽段鉴定是当前蛋白质组学研究的基本任务。由于质谱的复杂性,目前大多数质谱无法得到解析。存在意外或未知的蛋白质翻译后修饰是一个主要原因。
本文描述了一种高效且与序列数据库无关的方法,用于在高精度肽质量谱中检测丰富的翻译后修饰。该方法基于这样的观察:修饰肽段及其未修饰对应物的谱图在肽质量和保留时间上相互关联。数据集中频繁出现的肽质量差异意味着可能的修饰,而小且一致的保留时间差异提供了正交的支持证据。我们建议使用二元高斯混合模型来区分与修饰相关的谱对和随机谱对。由于使用了二维信息,可以确定准确的修饰质量和可靠的谱对,以及修饰对肽保留时间的定量影响。
在两个糖蛋白数据集上的实验表明,我们的方法可以有效地检测丰富的修饰和谱对。通过将发现的修饰纳入数据库搜索或在配对谱之间传播肽段归属,平均多解析了10%的谱图。