Zou An-Min, Shi Jinhong, Ding Jiarui, Wu Fang-Xiang
Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada.
IEEE Trans Inf Technol Biomed. 2010 May;14(3):552-8. doi: 10.1109/TITB.2010.2040287. Epub 2010 Jan 29.
A single mass spectrometry experiment could produce hundreds of thousands of tandem mass spectra. Several search engines have been developed to interpret tandem mass spectra. All search engines need to determine the masses of peptide ions from their mass/charge ratios. Unfortunately, mass spectrometers do not detect the charges of ions. A current strategy is to search candidate peptides multiple times, once for each possible charge state (typically +2 or +3 ). However, this strategy not only wastes the search time, but also increases the risk of false positive peptide identification. This paper aims at discriminating doubly charged spectra from triply charged ones. Twenty-eight features are introduced to describe the discriminant characteristics of doubly charged and triply charged spectra. The support vector machine (SVM) technique is used to train the classifier on these 28 features. To verify the proposed method, computational experiments are conducted on two types of datasets: ISB dataset generated from the low-resolution ion-trap instrument and TOV dataset generated from the high-resolution quadrupole-time-of-flight instrument. For each type of dataset, the SVM-based classifiers are trained and tested on 20 randomly sampled subdatasets. The results show that the proposed method reaches average correct rates of 95% and 93% to discriminate doubly charged spectra from triply charged ones for the low-resolution ISB dataset and the high-resolution TOV dataset, respectively.
一次质谱实验可能会产生数十万条串联质谱。已经开发了几种搜索引擎来解释串联质谱。所有搜索引擎都需要根据肽离子的质荷比来确定其质量。不幸的是,质谱仪无法检测离子的电荷。当前的一种策略是对候选肽进行多次搜索,针对每个可能的电荷状态(通常为 +2 或 +3)各搜索一次。然而,这种策略不仅浪费搜索时间,还增加了错误鉴定肽的风险。本文旨在区分双电荷谱和三电荷谱。引入了 28 个特征来描述双电荷谱和三电荷谱的判别特性。使用支持向量机(SVM)技术基于这 28 个特征训练分类器。为了验证所提出的方法,在两种类型的数据集上进行了计算实验:从低分辨率离子阱仪器生成的 ISB 数据集和从高分辨率四极杆飞行时间仪器生成的 TOV 数据集。对于每种类型的数据集,基于 SVM 的分类器在 20 个随机采样的子数据集上进行训练和测试。结果表明,对于低分辨率的 ISB 数据集和高分辨率的 TOV 数据集,所提出的方法分别将双电荷谱与三电荷谱区分开的平均正确率达到了 95%和 93%。