Department of Pathology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States.
J Proteome Res. 2013 Mar 1;12(3):1108-19. doi: 10.1021/pr300631t. Epub 2013 Feb 12.
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC-MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to reduce the number of incorrect peptide matches and maximize the number of correct peptides at a fixed false discovery rate using a minimal number of scoring outputs from the SEQUEST search engine. The novel algorithm uses a three-step process: data cleaning, data refining through a SVM-based decision function, and a final data refining step based on proteolytic peptide patterns. Using proteomics data generated on different types of mass spectrometers, we optimized the De-Noise algorithm on the basis of the resolution and mass accuracy of the mass spectrometer employed in the LC-MS/MS experiment. Our results demonstrate De-Noise improves peptide identification compared to other methods used to process the peptide sequence matches assigned by SEQUEST. Because De-Noise uses a limited number of scoring attributes, it can be easily implemented with other search engines.
液相色谱与串联质谱联用(LC-MS/MS)技术极大地推动了复合物、细胞和组织的蛋白质组学分析。在典型的蛋白质组学分析中,串联质谱通过搜索引擎分配给肽,该搜索引擎将实验 MS/MS 肽数据与蛋白质数据库中的理论肽序列进行比较。肽谱匹配然后用于推断原始样品中鉴定的蛋白质列表。然而,搜索引擎往往无法区分正确和错误的肽分配。在这项研究中,我们设计并实现了一种名为 De-Noise 的新算法,该算法使用来自 SEQUEST 搜索引擎的最小数量的评分输出,以固定的错误发现率减少错误肽匹配的数量,并最大限度地增加正确肽的数量。该新算法使用三步过程:数据清理、基于 SVM 的决策函数的数据精炼以及基于蛋白水解肽模式的最终数据精炼步骤。使用在不同类型质谱仪上生成的蛋白质组学数据,我们根据 LC-MS/MS 实验中使用的质谱仪的分辨率和质量精度对 De-Noise 算法进行了优化。我们的结果表明,与用于处理 SEQUEST 分配的肽序列匹配的其他方法相比,De-Noise 可提高肽鉴定的准确性。由于 De-Noise 使用有限数量的评分属性,因此可以很容易地与其他搜索引擎一起实现。