Suppr超能文献

一种用于验证 shotgun 蛋白质组学搜索引擎中肽鉴定的新算法。

A novel algorithm for validating peptide identification from a shotgun proteomics search engine.

机构信息

Department of Pathology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232, United States.

出版信息

J Proteome Res. 2013 Mar 1;12(3):1108-19. doi: 10.1021/pr300631t. Epub 2013 Feb 12.

Abstract

Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC-MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to reduce the number of incorrect peptide matches and maximize the number of correct peptides at a fixed false discovery rate using a minimal number of scoring outputs from the SEQUEST search engine. The novel algorithm uses a three-step process: data cleaning, data refining through a SVM-based decision function, and a final data refining step based on proteolytic peptide patterns. Using proteomics data generated on different types of mass spectrometers, we optimized the De-Noise algorithm on the basis of the resolution and mass accuracy of the mass spectrometer employed in the LC-MS/MS experiment. Our results demonstrate De-Noise improves peptide identification compared to other methods used to process the peptide sequence matches assigned by SEQUEST. Because De-Noise uses a limited number of scoring attributes, it can be easily implemented with other search engines.

摘要

液相色谱与串联质谱联用(LC-MS/MS)技术极大地推动了复合物、细胞和组织的蛋白质组学分析。在典型的蛋白质组学分析中,串联质谱通过搜索引擎分配给肽,该搜索引擎将实验 MS/MS 肽数据与蛋白质数据库中的理论肽序列进行比较。肽谱匹配然后用于推断原始样品中鉴定的蛋白质列表。然而,搜索引擎往往无法区分正确和错误的肽分配。在这项研究中,我们设计并实现了一种名为 De-Noise 的新算法,该算法使用来自 SEQUEST 搜索引擎的最小数量的评分输出,以固定的错误发现率减少错误肽匹配的数量,并最大限度地增加正确肽的数量。该新算法使用三步过程:数据清理、基于 SVM 的决策函数的数据精炼以及基于蛋白水解肽模式的最终数据精炼步骤。使用在不同类型质谱仪上生成的蛋白质组学数据,我们根据 LC-MS/MS 实验中使用的质谱仪的分辨率和质量精度对 De-Noise 算法进行了优化。我们的结果表明,与用于处理 SEQUEST 分配的肽序列匹配的其他方法相比,De-Noise 可提高肽鉴定的准确性。由于 De-Noise 使用有限数量的评分属性,因此可以很容易地与其他搜索引擎一起实现。

相似文献

1
A novel algorithm for validating peptide identification from a shotgun proteomics search engine.
J Proteome Res. 2013 Mar 1;12(3):1108-19. doi: 10.1021/pr300631t. Epub 2013 Feb 12.
2
Comparative database search engine analysis on massive tandem mass spectra of pork-based food products for halal proteomics.
J Proteomics. 2021 Jun 15;241:104240. doi: 10.1016/j.jprot.2021.104240. Epub 2021 Apr 21.
3
Enhanced peptide quantification using spectral count clustering and cluster abundance.
BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.
4
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.
J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.
5
MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra.
J Proteome Res. 2014 Aug 1;13(8):3679-84. doi: 10.1021/pr500202e. Epub 2014 Jun 26.
6
Empirical multidimensional space for scoring peptide spectrum matches in shotgun proteomics.
J Proteome Res. 2014 Apr 4;13(4):1911-20. doi: 10.1021/pr401026y. Epub 2014 Mar 13.
7
Tailoring to Search Engines: Bottom-Up Proteomics with Collision Energies Optimized for Identification Confidence.
J Proteome Res. 2021 Jan 1;20(1):474-484. doi: 10.1021/acs.jproteome.0c00518. Epub 2020 Dec 7.
8
Optimization and use of peptide mass measurement accuracy in shotgun proteomics.
Mol Cell Proteomics. 2006 Jul;5(7):1326-37. doi: 10.1074/mcp.M500339-MCP200. Epub 2006 Apr 23.

引用本文的文献

1
A cost-sensitive online learning method for peptide identification.
BMC Genomics. 2020 Apr 25;21(1):324. doi: 10.1186/s12864-020-6693-y.
2
An adaptive classification model for peptide identification.
BMC Genomics. 2015;16 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2164-16-S11-S1. Epub 2015 Nov 10.

本文引用的文献

1
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.
2
Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions.
J Am Soc Mass Spectrom. 1995 Apr;6(4):229-33. doi: 10.1016/1044-0305(95)00017-8.
3
Saccharomyces Genome Database: the genomics resource of budding yeast.
Nucleic Acids Res. 2012 Jan;40(Database issue):D700-5. doi: 10.1093/nar/gkr1029. Epub 2011 Nov 21.
4
Dynamic changes in histone acetylation regulate origins of DNA replication.
Nat Struct Mol Biol. 2010 Apr;17(4):430-7. doi: 10.1038/nsmb.1780. Epub 2010 Mar 14.
5
A guided tour of the Trans-Proteomic Pipeline.
Proteomics. 2010 Mar;10(6):1150-9. doi: 10.1002/pmic.200900375.
6
Empirical approach to false discovery rate estimation in shotgun proteomics.
Rapid Commun Mass Spectrom. 2010 Feb;24(4):454-62. doi: 10.1002/rcm.4417.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验