Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.
BMC Bioinformatics. 2011 Aug 17;12:346. doi: 10.1186/1471-2105-12-346.
The analysis of mass spectra suggests that the existence of derivative peaks is strongly dependent on the intensity of the primary peaks. Peak selection from tandem mass spectrum is used to filter out noise and contaminant peaks. It is widely accepted that a valid primary peak tends to have high intensity and is accompanied by derivative peaks, including isotopic peaks, neutral loss peaks, and complementary peaks. Existing models for peak selection ignore the dependence between the existence of the derivative peaks and the intensity of the primary peaks. Simple models for peak selection assume that these two attributes are independent; however, this assumption is contrary to real data and prone to error.
In this paper, we present a statistical model to quantitatively measure the dependence of the derivative peak's existence on the primary peak's intensity. Here, we propose a statistical model, named ProbPS, to capture the dependence in a quantitative manner and describe a statistical model for peak selection. Our results show that the quantitative understanding can successfully guide the peak selection process. By comparing ProbPS with AuDeNS we demonstrate the advantages of our method in both filtering out noise peaks and in improving de novo identification. In addition, we present a tag identification approach based on our peak selection method. Our results, using a test data set, suggest that our tag identification method (876 correct tags in 1000 spectra) outperforms PepNovoTag (790 correct tags in 1000 spectra).
We have shown that ProbPS improves the accuracy of peak selection which further enhances the performance of de novo sequencing and tag identification. Thus, our model saves valuable computation time and improving the accuracy of the results.
质谱分析表明,衍生峰的存在强烈依赖于主峰的强度。串联质谱中的峰选择用于滤除噪声和杂质峰。人们普遍认为,有效的主峰往往具有高强度,并伴有衍生峰,包括同位素峰、中性丢失峰和互补峰。现有的峰选择模型忽略了衍生峰的存在与主峰强度之间的依赖性。简单的峰选择模型假设这两个属性是独立的;然而,这种假设与实际数据相悖,容易出错。
在本文中,我们提出了一个统计模型来定量测量衍生峰的存在对主峰强度的依赖性。在这里,我们提出了一个名为 ProbPS 的统计模型,以定量捕捉依赖性,并描述一个用于峰选择的统计模型。我们的结果表明,定量理解可以成功地指导峰选择过程。通过将 ProbPS 与 AuDeNS 进行比较,我们证明了我们的方法在滤除噪声峰和提高从头鉴定方面的优势。此外,我们还提出了一种基于我们的峰选择方法的标签鉴定方法。我们使用测试数据集的结果表明,我们的标签鉴定方法(在 1000 个光谱中有 876 个正确标签)优于 PepNovoTag(在 1000 个光谱中有 790 个正确标签)。
我们已经表明,ProbPS 提高了峰选择的准确性,从而进一步提高了从头测序和标签鉴定的性能。因此,我们的模型节省了宝贵的计算时间,并提高了结果的准确性。