Park Kunsoo, Yoon Joo Young, Lee Sunho, Paek Eunok, Park Heejin, Jung Hee-Jung, Lee Sang-Won
School of Computer Science and Engineering, Seoul National University, Seoul, Korea.
Anal Chem. 2008 Oct 1;80(19):7294-303. doi: 10.1021/ac800913b. Epub 2008 Aug 28.
Determining isotopic clusters and their monoisotopic masses is a first step in interpreting complex mass spectra generated by high-resolution mass spectrometers. We propose a mathematical model for isotopic distributions of polypeptides and an effective interpretation algorithm. Our model uses two types of ratios: intensity ratio of two adjacent peaks and intensity ratio product of three adjacent peaks in an isotopic distribution. These ratios can be approximated as simple functions of a polypeptide mass, the values of which fall within certain ranges, depending on the polypeptide mass. Given a spectrum as a peak list, our algorithm first finds all isotopic clusters consisting of two or more peaks. Then, it scores clusters using the ranges of ratio functions and computes the monoisotopic masses of the identified clusters. Our method was applied to high-resolution mass spectra obtained from a Fourier transform ion cyclotron resonance (FTICR) mass spectrometer coupled to reverse-phase liquid chromatography (RPLC). For polypeptides whose amino acid sequences were identified by tandem mass spectrometry (MS/MS), we applied both THRASH-based software implementations and our method. Our method was observed to find more masses of known peptides when the numbers of the total clusters identified by both methods were fixed. Experimental results show that our method performed better for isotopic mass clusters of weak intensity where the isotopic distributions deviate significantly from their theoretical distributions. Also, it correctly identified some isotopic clusters that were not found by THRASH-based implementations, especially those for which THRASH gave 1 Da mismatches. Another advantage of our method is that it is very fast, much faster than THRASH that calculates the least-squares fit.
确定同位素簇及其单同位素质量是解释高分辨率质谱仪产生的复杂质谱图的第一步。我们提出了一种多肽同位素分布的数学模型和一种有效的解释算法。我们的模型使用两种类型的比率:同位素分布中两个相邻峰的强度比和三个相邻峰的强度比乘积。这些比率可以近似为多肽质量的简单函数,其值根据多肽质量落在特定范围内。给定一个作为峰列表的光谱,我们的算法首先找到所有由两个或更多峰组成的同位素簇。然后,它使用比率函数的范围对簇进行评分,并计算已识别簇的单同位素质量。我们的方法应用于从与反相液相色谱(RPLC)联用的傅里叶变换离子回旋共振(FTICR)质谱仪获得的高分辨率质谱图。对于通过串联质谱(MS/MS)鉴定了氨基酸序列的多肽,我们同时应用了基于THRASH的软件实现和我们的方法。当两种方法识别的总簇数固定时,我们的方法能找到更多已知肽的质量。实验结果表明,对于强度较弱的同位素质量簇,我们的方法表现更好,在这些情况下,同位素分布与理论分布有显著偏差。此外,它正确识别了一些基于THRASH的实现未发现的同位素簇,特别是那些THRASH给出1 Da错配的情况。我们方法的另一个优点是它非常快,比计算最小二乘拟合的THRASH快得多。