通过蛋白质组学鉴定幽门螺杆菌 26695 株的新蛋白编码序列和信号肽切割位点。

Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.

机构信息

Department of Proteomics, UFZ, Helmholtz-Centre for Environmental Research Leipzig, 04318 Leipzig, Germany.

出版信息

J Proteomics. 2013 Jun 28;86:27-42. doi: 10.1016/j.jprot.2013.04.036. Epub 2013 May 9.

DOI:10.1016/j.jprot.2013.04.036

PMID:23665149

Abstract

UNLABELLED

Correct annotation of protein coding genes is the basis of conventional data analysis in proteomic studies. Nevertheless, most protein sequence databases almost exclusively rely on gene finding software and inevitably also miss protein annotations or possess errors. Proteogenomics tries to overcome these issues by matching MS data directly against a genome sequence database. Here we report an in-depth proteogenomics study of Helicobacter pylori strain 26695. MS data was searched against a combined database of the NCBI annotations and a six-frame translation of the genome. Database searches with Mascot and X! Tandem revealed 1115 proteins identified by at least two peptides with a peptide false discovery rate below 1%. This represents 71% of the predicted proteome. So far this is the most extensive proteome study of Helicobacter pylori. Our proteogenomic approach unambiguously identified four previously missed annotations and furthermore allowed us to correct sequences of six annotated proteins. Since secreted proteins are often involved in pathogenic processes we further investigated signal peptidase cleavage sites. By applying a database search that accommodates the identification of semi-specific cleaved peptides, 63 previously unknown signal peptides were detected. The motif LXA showed to be the predominant recognition sequence for signal peptidases.

BIOLOGICAL SIGNIFICANCE

The results of MS-based proteomic studies highly rely on correct annotation of protein coding genes which is the basis of conventional data analysis. However, the annotation of protein coding sequences in genomic data is usually based on gene finding software. These tools are limited in their prediction accuracy such as the problematic determination of exact gene boundaries. Thus, protein databases own partly erroneous or incomplete sequences. Additionally, some protein sequences might also be missing in the databases. Proteogenomics, a combination of proteomic and genomic data analyses, is well suited to detect previously not annotated proteins and to correct erroneous sequences. For this purpose, the existing database of the investigated species is typically supplemented with a six-frame translation of the genome. Here, we studied the proteome of the major human pathogen Helicobacter pylori that is responsible for many gastric diseases such as duodenal ulcers and gastric cancer. Our in-depth proteomic study highly reliably identified 1115 proteins (FDR<0.01%) by at least two peptides (FDR<1%) which represent 71% of the predicted proteome deposited at NCBI. The proteogenomic data analysis of our data set resulted in the unambiguous identification of four previously missed annotations, the correction of six annotated proteins as well as the detection of 63 previously unknown signal peptides. We have annotated proteins of particular biological interest like the ferrous iron transport protein A, the coiled-coil-rich protein HP0058 and the lipopolysaccharide biosynthesis protein HP0619. For instance, the protein HP0619 could be a drug target for the inhibition of the LPS synthesis pathway. Furthermore it has been proven that the motif "LXA" is the predominant recognition sequence for the signal peptidase I of H. pylori. Signal peptidases are essential enzymes for the viability of bacterial cells and are involved in pathogenesis. Therefore signal peptidases could be novel targets for antibiotics. The inclusion of the corrected and new annotated proteins as well as the information of signal peptide cleavage sites will help in the study of biological pathways involved in pathogenesis or drug response of H. pylori.

摘要

未注释

蛋白质编码基因的正确注释是蛋白质组学研究中常规数据分析的基础。然而，大多数蛋白质序列数据库几乎完全依赖于基因发现软件，并且不可避免地也会错过蛋白质注释或存在错误。蛋白质基因组学试图通过将 MS 数据直接与基因组序列数据库匹配来克服这些问题。在这里，我们报告了对幽门螺杆菌 26695 菌株的深入蛋白质基因组学研究。MS 数据针对 NCBI 注释的组合数据库和基因组的六框翻译进行了搜索。Mascot 和 X!Tandem 的数据库搜索揭示了 1115 种蛋白质，这些蛋白质至少被两种肽鉴定，肽假阳性率低于 1%。这代表了预测蛋白质组的 71%。到目前为止，这是对幽门螺杆菌进行的最广泛的蛋白质组学研究。我们的蛋白质基因组学方法明确鉴定了四个以前错过的注释，并且还允许我们纠正六个注释蛋白质的序列。由于分泌蛋白通常参与发病过程，因此我们进一步研究了信号肽切割位点。通过应用可识别半特异性切割肽的数据库搜索，检测到 63 个先前未知的信号肽。LXA 基序被证明是信号肽酶的主要识别序列。

生物学意义

基于 MS 的蛋白质组学研究的结果高度依赖于蛋白质编码基因的正确注释，这是常规数据分析的基础。然而，基因组数据中蛋白质编码序列的注释通常基于基因发现软件。这些工具在其预测准确性方面存在局限性，例如确定确切的基因边界存在问题。因此，蛋白质数据库拥有部分错误或不完整的序列。此外，数据库中可能还缺少一些蛋白质序列。蛋白质基因组学是蛋白质组学和基因组学数据分析的结合，非常适合检测以前未注释的蛋白质并纠正错误的序列。为此，通常用基因组的六框翻译补充研究物种的现有数据库。在这里，我们研究了主要人类病原体幽门螺杆菌的蛋白质组，它是许多胃部疾病（如十二指肠溃疡和胃癌）的罪魁祸首。我们深入的蛋白质组学研究高度可靠地鉴定了 1115 种蛋白质（FDR<0.01%），这些蛋白质至少被两种肽鉴定（FDR<1%），代表了在 NCBI 中预测的蛋白质组的 71%。我们对数据集的蛋白质基因组数据分析导致明确鉴定了四个以前错过的注释，纠正了六个注释的蛋白质以及检测到 63 个以前未知的信号肽。我们已经注释了具有特殊生物学意义的蛋白质，例如亚铁转运蛋白 A、富含卷曲螺旋的蛋白 HP0058 和脂多糖生物合成蛋白 HP0619。例如，HP0619 蛋白可以成为抑制 LPS 合成途径的药物靶标。此外，已经证明“LXA”基序是幽门螺杆菌信号肽酶 I 的主要识别序列。信号肽酶是细菌细胞活力所必需的酶，并且参与发病机制。因此，信号肽酶可能是新型抗生素的靶标。包含已更正和新注释的蛋白质以及信号肽切割位点的信息将有助于研究与发病机制或药物反应相关的生物途径。

相似文献

Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.通过蛋白质组学鉴定幽门螺杆菌 26695 株的新蛋白编码序列和信号肽切割位点。

J Proteomics. 2013 Jun 28;86:27-42. doi: 10.1016/j.jprot.2013.04.036. Epub 2013 May 9.

Deep coverage of the Escherichia coli proteome enables the assessment of false discovery rates in simple proteogenomic experiments.深度覆盖大肠杆菌蛋白质组可用于评估简单蛋白质基因组实验中的假发现率。

Mol Cell Proteomics. 2013 Nov;12(11):3420-30. doi: 10.1074/mcp.M113.029165. Epub 2013 Aug 1.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

False discovery rate: the Achilles' heel of proteogenomics.错误发现率：蛋白质基因组学的致命弱点。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac163.

Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate.使用多种搜索引擎的集成蛋白质组学流程用于蛋白质基因组学研究并控制蛋白质错误发现率

J Proteome Res. 2016 Nov 4;15(11):4082-4090. doi: 10.1021/acs.jproteome.6b00376. Epub 2016 Aug 30.

Moving from unsequenced to sequenced genome: reanalysis of the proteome of Leishmania donovani.从未测序基因组到测序基因组：杜氏利什曼原虫蛋白质组的重新分析

J Proteomics. 2014 Jan 31;97:48-61. doi: 10.1016/j.jprot.2013.04.021. Epub 2013 May 9.

Proteogenomic mapping of Mycoplasma hyopneumoniae virulent strain 232.猪肺炎支原体强毒株232的蛋白质基因组图谱

BMC Genomics. 2014 Jul 8;15(1):576. doi: 10.1186/1471-2164-15-576.

Proteomics in non-human primates: utilizing RNA-Seq data to improve protein identification by mass spectrometry in vervet monkeys.非人类灵长类动物的蛋白质组学：利用 RNA-Seq 数据提高食蟹猴中质谱法的蛋白质鉴定水平。

BMC Genomics. 2017 Nov 13;18(1):877. doi: 10.1186/s12864-017-4279-0.

An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.通过蛋白质基因组学鉴定原核基因组全部蛋白质编码潜能的综合策略。

Genome Res. 2017 Dec;27(12):2083-2095. doi: 10.1101/gr.218255.116. Epub 2017 Nov 15.

Flexible Data Analysis Pipeline for High-Confidence Proteogenomics.用于高可信度蛋白质基因组学的灵活数据分析流程

J Proteome Res. 2016 Dec 2;15(12):4686-4695. doi: 10.1021/acs.jproteome.6b00765. Epub 2016 Nov 10.

引用本文的文献

Deep proteome coverage advances knowledge of Treponema pallidum protein expression profiles during infection.深度蛋白质组覆盖范围提高了苍白密螺旋体感染过程中蛋白表达谱的知识。

Sci Rep. 2023 Oct 25;13(1):18259. doi: 10.1038/s41598-023-45219-8.

Divide and conquer: genetics, mechanism, and evolution of the ferrous iron transporter Feo in .分而治之：[具体物种]中铁转运蛋白Feo的遗传学、机制及进化

Front Microbiol. 2023 Jul 4;14:1219359. doi: 10.3389/fmicb.2023.1219359. eCollection 2023.

A Practical Guide to Small Protein Discovery and Characterization Using Mass Spectrometry.使用质谱技术进行小蛋白发现和鉴定的实用指南。

J Bacteriol. 2022 Jan 18;204(1):e0035321. doi: 10.1128/JB.00353-21. Epub 2021 Nov 8.

Discovery of novel community-relevant small proteins in a simplified human intestinal microbiome.在简化的人类肠道微生物组中发现新型与社区相关的小蛋白。

Microbiome. 2021 Feb 23;9(1):55. doi: 10.1186/s40168-020-00981-z.

Are Antisense Proteins in Prokaryotes Functional?原核生物中的反义蛋白具有功能吗？

Front Mol Biosci. 2020 Aug 14;7:187. doi: 10.3389/fmolb.2020.00187. eCollection 2020.

Expanding the Vocabulary of Peptide Signals in .扩展. 中的肽信号词汇

Front Cell Infect Microbiol. 2019 Jun 6;9:194. doi: 10.3389/fcimb.2019.00194. eCollection 2019.

Terminomics Methodologies and the Completeness of Reductive Dimethylation: A Meta-Analysis of Publicly Available Datasets.术语组学方法与还原二甲基化的完整性：对公开可用数据集的荟萃分析

Proteomes. 2019 Mar 29;7(2):11. doi: 10.3390/proteomes7020011.

Unraveling the hidden universe of small proteins in bacterial genomes.揭示细菌基因组中小蛋白的隐藏宇宙。

Mol Syst Biol. 2019 Feb 22;15(2):e8290. doi: 10.15252/msb.20188290.

Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes.Peptimapper：用于真核生物基因组专家注释的蛋白质基因组学工作流程。

BMC Genomics. 2019 Jan 17;20(1):56. doi: 10.1186/s12864-019-5431-9.

Proteome Data Improves Protein Function Prediction in the Interactome of .蛋白质组数据可改善互作组中蛋白质功能的预测。

Mol Cell Proteomics. 2018 May;17(5):961-973. doi: 10.1074/mcp.RA117.000474. Epub 2018 Feb 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过蛋白质组学鉴定幽门螺杆菌 26695 株的新蛋白编码序列和信号肽切割位点。

Identification of new protein coding sequences and signal peptidase cleavage sites of Helicobacter pylori strain 26695 by proteogenomics.

机构信息

出版信息

UNLABELLED

BIOLOGICAL SIGNIFICANCE

未注释

生物学意义

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献