Suppr超能文献

使用多种搜索引擎的集成蛋白质组学流程用于蛋白质基因组学研究并控制蛋白质错误发现率

Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.

作者信息

Park Gun Wook, Hwang Heeyoun, Kim Kwang Hoe, Lee Ju Yeon, Lee Hyun Kyoung, Park Ji Yeong, Ji Eun Sun, Park Sung-Kyu Robin, Yates John R, Kwon Kyung-Hoon, Park Young Mok, Lee Hyoung-Joo, Paik Young-Ki, Kim Jin Young, Yoo Jong Shin

机构信息

Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.

Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea.

出版信息

J Proteome Res. 2016 Nov 4;15(11):4082-4090. doi: 10.1021/acs.jproteome.6b00376. Epub 2016 Aug 30.

Abstract

In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).

摘要

在以染色体为中心的人类蛋白质组计划(C-HPP)中,数据库搜索后通过肽段谱匹配(PSM)进行的假阳性鉴定是使用基于液相色谱和质谱的大型蛋白质组分析进行蛋白质基因组学研究的一个主要问题。在此,我们开发了一种简单的蛋白质鉴定策略,在蛋白质水平上控制假发现率(FDR),使用一个集成蛋白质组学流程(IPP),该流程由以下四个步骤组成。首先,使用三种不同的搜索引擎SEQUEST、MASCOT和MS-GF+,针对neXtProt数据库进行单独的蛋白质组搜索。其次,使用包括DTASelect和Percolator在内的统计评估工具对PSM的搜索结果进行合并。第三,使用内部程序将肽段搜索分数转换为归一化的E值。最后,使用ProteinInferencer在蛋白质水平上以1.0%的受控FDR筛选包含两个或更多肽段的蛋白质。最后,我们将IPP的性能与传统蛋白质组学流程(CPP)在蛋白质水平上使用<1%的受控FDR进行蛋白质鉴定的性能进行了比较。使用IPP,从人海马组织中鉴定出总共5756种蛋白质(使用CPP为4453种),包括477种可变剪接变体(使用CPP为182种)。此外,总共鉴定出10种缺失蛋白质(使用CPP为7种),它们有两个或更多独特肽段,其胰蛋白酶肽段使用来自储存库数据库的MS/MS谱图或其相应的合成肽段进行了验证。这项研究表明,IPP有效地改善了C-HPP中人海马组织中蛋白质的鉴定,包括可变剪接变体和缺失蛋白质。本研究中使用的所有原始文件都已存入ProteomeXchange(PXD000395)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验