Robert Koch Institute, Methodenentwicklung und Forschungsinfrastruktur (MF), Berlin, Germany.
University of Technical Sciences Braunschweig, Institute for Microbiology, Braunschweig, Germany.
PLoS Genet. 2021 Jun 1;17(6):e1009585. doi: 10.1371/journal.pgen.1009585. eCollection 2021 Jun.
Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 proteins with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids.
小蛋白在细菌生理学和毒力中起着至关重要的作用,然而,基因组注释的自动化算法往往还不能准确预测相应的基因。通过整合来自实验方法的蛋白质证据,可以显著提高基因组注释的准确性和可靠性,特别是对于小开放阅读框 (sORF)。在这里,我们提出了一个高度优化和灵活的细菌蛋白质组学生物信息学工作流程,涵盖了从 (i) 生成蛋白质数据库,(ii) 数据库搜索和 (iii) 肽到基因组映射到 (iv) 结果可视化的所有步骤。我们使用该工作流程来鉴定金黄色葡萄球菌 Newman 中小蛋白 (≤100 aa,SP100) 的高质量肽谱匹配 (PSM)。从金黄色葡萄球菌中提取蛋白质,进行不同的蛋白质消化和预分级实验流程,并使用高灵敏度质谱仪进行测量。总共鉴定出 175 种具有多达 100 aa (SP100) 的蛋白质。其中 24 种 (范围为 9 到 99 aa) 是新的,不在使用的基因组注释中。144 个 SP100 高度保守,存在于至少 50%的公开金黄色葡萄球菌基因组中,而 127 个在其他葡萄球菌中也保守。鉴定出的 SP100 中有近一半是碱性的,这表明它们在与更酸性的分子(如核酸或磷脂)结合中发挥作用。