使用多种搜索引擎的集成蛋白质组学流程用于蛋白质基因组学研究并控制蛋白质错误发现率

Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

作者信息

Park Gun Wook, Hwang Heeyoun, Kim Kwang Hoe, Lee Ju Yeon, Lee Hyun Kyoung, Park Ji Yeong, Ji Eun Sun, Park Sung-Kyu Robin, Yates John R, Kwon Kyung-Hoon, Park Young Mok, Lee Hyoung-Joo, Paik Young-Ki, Kim Jin Young, Yoo Jong Shin

机构信息

Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.

Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea.

出版信息

J Proteome Res. 2016 Nov 4;15(11):4082-4090. doi: 10.1021/acs.jproteome.6b00376. Epub 2016 Aug 30.

DOI:10.1021/acs.jproteome.6b00376

PMID:27537616

Abstract

In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).

摘要

在以染色体为中心的人类蛋白质组计划（C-HPP）中，数据库搜索后通过肽段谱匹配（PSM）进行的假阳性鉴定是使用基于液相色谱和质谱的大型蛋白质组分析进行蛋白质基因组学研究的一个主要问题。在此，我们开发了一种简单的蛋白质鉴定策略，在蛋白质水平上控制假发现率（FDR），使用一个集成蛋白质组学流程（IPP），该流程由以下四个步骤组成。首先，使用三种不同的搜索引擎SEQUEST、MASCOT和MS-GF+，针对neXtProt数据库进行单独的蛋白质组搜索。其次，使用包括DTASelect和Percolator在内的统计评估工具对PSM的搜索结果进行合并。第三，使用内部程序将肽段搜索分数转换为归一化的E值。最后，使用ProteinInferencer在蛋白质水平上以1.0%的受控FDR筛选包含两个或更多肽段的蛋白质。最后，我们将IPP的性能与传统蛋白质组学流程（CPP）在蛋白质水平上使用<1%的受控FDR进行蛋白质鉴定的性能进行了比较。使用IPP，从人海马组织中鉴定出总共5756种蛋白质（使用CPP为4453种），包括477种可变剪接变体（使用CPP为182种）。此外，总共鉴定出10种缺失蛋白质（使用CPP为7种），它们有两个或更多独特肽段，其胰蛋白酶肽段使用来自储存库数据库的MS/MS谱图或其相应的合成肽段进行了验证。这项研究表明，IPP有效地改善了C-HPP中人海马组织中蛋白质的鉴定，包括可变剪接变体和缺失蛋白质。本研究中使用的所有原始文件都已存入ProteomeXchange（PXD000395）。

相似文献

Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

J Proteome Res. 2016 Nov 4;15(11):4082-4090. doi: 10.1021/acs.jproteome.6b00376. Epub 2016 Aug 30.

Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases.

J Proteome Res. 2015 Dec 4;14(12):5028-37. doi: 10.1021/acs.jproteome.5b00472. Epub 2015 Nov 16.

Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases.

J Proteome Res. 2017 Dec 1;16(12):4425-4434. doi: 10.1021/acs.jproteome.7b00223. Epub 2017 Oct 13.

Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project.

J Proteome Res. 2015 Dec 4;14(12):4959-66. doi: 10.1021/acs.jproteome.5b00578. Epub 2015 Sep 14.

Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data.

J Proteome Res. 2015 Nov 6;14(11):4662-73. doi: 10.1021/acs.jproteome.5b00536. Epub 2015 Sep 30.

Flexible Data Analysis Pipeline for High-Confidence Proteogenomics.

J Proteome Res. 2016 Dec 2;15(12):4686-4695. doi: 10.1021/acs.jproteome.6b00765. Epub 2016 Nov 10.

Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification.

BMC Genomics. 2016 Dec 22;17(Suppl 13):1031. doi: 10.1186/s12864-016-3327-5.

Systematic Comparison of False-Discovery-Rate-Controlling Strategies for Proteogenomic Search Using Spike-in Experiments.

J Proteome Res. 2017 Jun 2;16(6):2231-2239. doi: 10.1021/acs.jproteome.7b00033. Epub 2017 May 5.

Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process.

BMC Genomics. 2017 Mar 14;18(Suppl 2):143. doi: 10.1186/s12864-017-3491-2.

Computational and Mass-Spectrometry-Based Workflow for the Discovery and Validation of Missing Human Proteins: Application to Chromosomes 2 and 14.

J Proteome Res. 2015 Sep 4;14(9):3621-34. doi: 10.1021/pr5010345. Epub 2015 Jul 22.

引用本文的文献

Exploring the Alternative Proteome with OpenProt and Mass Spectrometry.

Methods Mol Biol. 2024;2836:3-17. doi: 10.1007/978-1-0716-4007-4_1.

Analysis of the mechanism of alleviating Alzheimer's disease based on transcriptomics and proteomics.

Korean J Physiol Pharmacol. 2024 Jul 1;28(4):361-377. doi: 10.4196/kjpp.2024.28.4.361.

Unsupervised Mining of HLA-I Peptidomes Reveals New Binding Motifs and Potential False Positives in the Community Database.

Front Immunol. 2022 Mar 21;13:847756. doi: 10.3389/fimmu.2022.847756. eCollection 2022.

Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows.

Nat Commun. 2021 Dec 15;12(1):7305. doi: 10.1038/s41467-021-27542-8.

Psoriasis to Psoriatic Arthritis: The Application of Proteomics Technologies.

Front Med (Lausanne). 2021 Nov 16;8:681172. doi: 10.3389/fmed.2021.681172. eCollection 2021.

Proteogenomic Analysis Provides Novel Insight into Genome Annotation and Nitrogen Metabolism in sp. PCC 7120.

Microbiol Spectr. 2021 Oct 31;9(2):e0049021. doi: 10.1128/Spectrum.00490-21. Epub 2021 Sep 15.

Identification and differential expression of serotransferrin and apolipoprotein A-I in the plasma of HIV-1 patients treated with first-line antiretroviral therapy.

BMC Infect Dis. 2020 Nov 27;20(1):898. doi: 10.1186/s12879-020-05610-6.

Pollination Drop Proteome and Reproductive Organ Transcriptome Comparison in Reveals Entomophilous Adaptation.

Genes (Basel). 2019 Oct 12;10(10):800. doi: 10.3390/genes10100800.

Beyond Genes: Re-Identifiability of Proteomic Data and Its Implications for Personalized Medicine.

Genes (Basel). 2019 Sep 5;10(9):682. doi: 10.3390/genes10090682.

The Novel Cerato-Platanin-Like Protein FocCP1 from Triggers an Immune Response in Plants.

Int J Mol Sci. 2019 Jun 11;20(11):2849. doi: 10.3390/ijms20112849.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用多种搜索引擎的集成蛋白质组学流程用于蛋白质基因组学研究并控制蛋白质错误发现率

Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献