Zou An-Min, Wu Fang-Xiang, Ding Jia-Rui, Poirier Guy G
Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Dr, Saskatoon, SK, S7N 59A, Canada.
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S49. doi: 10.1186/1471-2105-10-S1-S49.
Tandem mass spectrometry has become particularly useful for the rapid identification and characterization of protein components of complex biological mixtures. Powerful database search methods have been developed for the peptide identification, such as SEQUEST and MASCOT, which are implemented by comparing the mass spectra obtained from unknown proteins or peptides with theoretically predicted spectra derived from protein databases. However, the majority of spectra generated from a mass spectrometry experiment are of too poor quality to be interpreted while some of spectra with high quality cannot be interpreted by one method but perhaps by others. Hence a filtering algorithm that removes those spectra with poor quality prior to the database search is appealing.
This paper proposes a support vector machine (SVM) based approach to assess the quality of tandem mass spectra. Each mass spectrum is mapping into the 16 proposed features to describe its quality. Based the results from SEQUEST, four SVM classifiers with the input of the 16 features are trained and tested on ISB data and TOV data, respectively. The superior performance of the proposed SVM classifiers is illustrated both by the comparison with the existing classifiers and by the validation in terms of MASCOT search results.
The proposed method can be employed to effectively remove the poor quality spectra before the spectral searching, and also to find the more peptides or post-translational peptides from spectra with high quality using different search engines or de novo method.
串联质谱对于快速鉴定和表征复杂生物混合物中的蛋白质成分尤为有用。已开发出强大的数据库搜索方法用于肽段鉴定,如SEQUEST和MASCOT,它们通过将从未知蛋白质或肽段获得的质谱与从蛋白质数据库导出的理论预测谱进行比较来实现。然而,质谱实验产生的大多数谱质量太差无法解释,而一些高质量的谱不能被一种方法解释,但可能被其他方法解释。因此,一种在数据库搜索之前去除低质量谱的过滤算法很有吸引力。
本文提出一种基于支持向量机(SVM)的方法来评估串联质谱的质量。每个质谱被映射到16个提出的特征以描述其质量。基于SEQUEST的结果,分别在ISB数据和TOV数据上训练和测试了四个以16个特征为输入的SVM分类器。通过与现有分类器的比较以及根据MASCOT搜索结果进行的验证,说明了所提出的SVM分类器的优越性能。
所提出的方法可用于在谱搜索之前有效地去除低质量谱,并且还可以使用不同的搜索引擎或从头测序方法从高质量谱中找到更多的肽段或翻译后肽段。