Suppr超能文献

两步式数据库搜索方法提高了代谢组学和蛋白质组学研究中肽序列匹配的灵敏度。

A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.

机构信息

Minnesota Supercomputing Institute, Minneapolis, MN, USA.

出版信息

Proteomics. 2013 Apr;13(8):1352-7. doi: 10.1002/pmic.201200352. Epub 2013 Mar 15.

Abstract

Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.

摘要

在代谢组学和蛋白质组学研究中,使用大型数据库(>10(6) 个序列)会对使用数据库搜索程序将肽序列与 MS/MS 数据进行匹配带来挑战。最值得注意的是,为避免假阳性匹配而进行的严格过滤会导致更多的假阴性,从而限制肽匹配的数量。为了解决这个挑战,我们开发了一种两步法,其中来自对大型数据库的初步搜索的匹配项被用于创建较小的子集数据库。第二次搜索是针对该子集数据库的目标诱饵版本与宿主数据库合并进行的。然后,使用高置信度的肽序列匹配来推断蛋白质身份。与传统的一步法相比,我们的两步法应用于代谢组学和蛋白质组学分析,在每种情况下都能获得两倍数量的高置信度肽序列匹配。两步法捕获了与一步法匹配的几乎相同的肽,其中大多数额外的匹配是一步法的假阴性。此外,两步法无论使用哪种数据库搜索程序都能改善结果。我们的结果表明,我们的两步法最大限度地提高了对需要大型数据库的应用的肽匹配灵敏度,这对于蛋白质组学和代谢组学研究特别有价值。

相似文献

2
Two-stepping to increase peptide spectra matches in large databases.
Proteomics. 2013 Apr;13(8):1229-30. doi: 10.1002/pmic.201300094.
5
A Sectioning and Database Enrichment Approach for Improved Peptide Spectrum Matching in Large, Genome-Guided Protein Sequence Databases.
J Proteome Res. 2020 Jul 2;19(7):2772-2785. doi: 10.1021/acs.jproteome.0c00260. Epub 2020 May 26.
6
Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process.
BMC Genomics. 2017 Mar 14;18(Suppl 2):143. doi: 10.1186/s12864-017-3491-2.
7
False discovery rate: the Achilles' heel of proteogenomics.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac163.
9
Peppy: proteogenomic search software.
J Proteome Res. 2013 Jun 7;12(6):3019-25. doi: 10.1021/pr400208w. Epub 2013 May 6.

引用本文的文献

2
Syntrophic bacterial and host-microbe interactions in bacterial vaginosis.
ISME J. 2025 Jan 2;19(1). doi: 10.1093/ismejo/wraf055.
3
Phylogenetic tree-based amino acid sequence generation for proteomics data analysis of unknown species.
Comput Struct Biotechnol J. 2025 May 29;27:2313-2322. doi: 10.1016/j.csbj.2025.05.041. eCollection 2025.
5
The microbiologist's guide to metaproteomics.
Imeta. 2025 May 6;4(3):e70031. doi: 10.1002/imt2.70031. eCollection 2025 Jun.
6
MultiStageSearch: An Iterative Workflow for Unbiased Taxonomic Analysis of Pathogens Using Proteogenomics.
J Proteome Res. 2025 Jun 6;24(6):2643-2656. doi: 10.1021/acs.jproteome.4c00901. Epub 2025 May 18.
7
Quality Control in the Mass Spectrometry Proteomics Core: A Practical Primer.
J Biomol Tech. 2024 Sep 12;35(3). doi: 10.7171/3fc1f5fe.42308a9a. eCollection 2024 Sep 30.
8
Cultivated genome references for protein database construction and high-resolution taxonomic annotation in metaproteomics.
Microbiol Spectr. 2025 Feb 4;13(2):e0175524. doi: 10.1128/spectrum.01755-24. Epub 2024 Dec 12.
9
NovoLign: metaproteomics by sequence alignment.
ISME Commun. 2024 Oct 12;4(1):ycae121. doi: 10.1093/ismeco/ycae121. eCollection 2024 Jan.

本文引用的文献

3
Deep metaproteomic analysis of human salivary supernatant.
Proteomics. 2012 Apr;12(7):992-1001. doi: 10.1002/pmic.201100503.
4
Exploring mixed microbial community functioning: recent advances in metaproteomics.
FEMS Microbiol Ecol. 2012 May;80(2):265-80. doi: 10.1111/j.1574-6941.2011.01284.x. Epub 2012 Jan 16.
5
Strategies for metagenomic-guided whole-community proteomics of complex microbial environments.
PLoS One. 2011;6(11):e27173. doi: 10.1371/journal.pone.0027173. Epub 2011 Nov 23.
7
Proteogenomics.
Proteomics. 2011 Feb;11(4):620-30. doi: 10.1002/pmic.201000615. Epub 2011 Jan 18.
8
An iterative workflow for mining the human intestinal metaproteome.
BMC Genomics. 2011 Jan 5;12:6. doi: 10.1186/1471-2164-12-6.
9
The complete peptide dictionary--a meta-proteomics resource.
Proteomics. 2010 Dec;10(23):4306-10. doi: 10.1002/pmic.201000270.
10
Identification of alternatively spliced transcripts using a proteomic informatics approach.
Methods Mol Biol. 2011;696:319-26. doi: 10.1007/978-1-60761-987-1_20.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验