Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
BMC Genomics. 2011 Aug 25;12:433. doi: 10.1186/1471-2164-12-433.
Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function.
We experimentally annotated the bacterial pathogen Salmonella Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in Salmonella and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in Salmonella pathogenesis. We also characterized post-translational features in the Salmonella genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our in vivo proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function.
This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of Salmonella as a resource for systems analysis.
完整和准确的基因组注释对于生物系统的全面和系统研究至关重要。然而,对于大多数新基因组,蛋白质编码基因的确定几乎完全是通过使用计算预测进行推断完成的,这些预测具有显著的记录错误率(>15%)。此外,基因预测程序没有提供对生物重要的翻译后处理事件的信息,这些事件对蛋白质功能至关重要。
我们使用“shotgun”蛋白质组学实验性注释了细菌病原体鼠伤寒沙门氏菌 14028,以准确揭示翻译景观和翻译后特征。这些数据为沙门氏菌中约一半预测的蛋白质编码基因提供了蛋白质水平的实验验证,并对几个似乎具有错误分配翻译起始位点的基因进行了修订,包括一个潜在的新起始密码子。此外,我们发现了 12 个基因预测程序错过的非注释基因,并提供了证据表明这些新 ORF 之一在沙门氏菌发病机制中的作用。我们还描述了沙门氏菌基因组中的翻译后特征,包括化学修饰和蛋白水解切割。我们发现细菌具有比以前认为的更大和更复杂的化学修饰库,包括几种新修饰。我们的体内蛋白水解数据确定了超过 130 个信号肽和 N 端甲硫氨酸切割事件,这些事件对蛋白质功能至关重要。
这项工作强调了蛋白质组学数据应用可以提高基因组注释质量的几种方式,以促进新的生物学见解,并提供了沙门氏菌的综合蛋白质组图谱,作为系统分析的资源。