Biological Sciences and Computational Sciences and Mathematics Division, Pacific Northwest National Laboratory, Richland, Washington, United States of America.
PLoS One. 2010 Nov 12;5(11):e13968. doi: 10.1371/journal.pone.0013968.
Global protein identification through current proteomics methods typically depends on the availability of sequenced genomes. In spite of increasingly high throughput sequencing technologies, this information is not available for every microorganism and rarely available for entire microbial communities. Nevertheless, the protein-level homology that exists between related bacteria makes it possible to extract biological information from the proteome of an organism or microbial community by using the genomic sequences of a near neighbor organism. Here, we demonstrate a trans-organism search strategy for determining the extent to which near-neighbor genome sequences can be applied to identify proteins in unsequenced environmental isolates. In proof of concept testing, we found that within a CLUSTAL W distance of 0.089, near-neighbor genomes successfully identified a high percentage of proteins within an organism. Application of this strategy to characterize environmental bacterial isolates lacking sequenced genomes, but having 16S rDNA sequence similarity to Shewanella resulted in the identification of 300-500 proteins in each strain. The majority of identified pathways mapped to core processes, as well as to processes unique to the Shewanellae, in particular to the presence of c-type cytochromes. Examples of core functional categories include energy metabolism, protein and nucleotide synthesis and cofactor biosynthesis, allowing classification of bacteria by observation of conserved processes. Additionally, within these core functionalities, we observed proteins involved in the alternative lactate utilization pathway, recently described in Shewanella.
通过当前的蛋白质组学方法进行全球蛋白质鉴定通常依赖于已测序基因组的可用性。尽管高通量测序技术越来越高,但并非每个微生物都有此信息,微生物群落的信息更是很少见。然而,相关细菌之间存在的蛋白质水平同源性使得可以通过使用近缘生物体的基因组序列从生物体或微生物群落的蛋白质组中提取生物信息。在这里,我们展示了一种跨生物体搜索策略,用于确定近缘基因组序列在多大程度上可以用于鉴定未测序环境分离物中的蛋白质。在概念验证测试中,我们发现,在 CLUSTAL W 距离为 0.089 内,近缘基因组成功地鉴定出了生物体中很大比例的蛋白质。将该策略应用于缺乏测序基因组但具有与希瓦氏菌 16S rDNA 序列相似性的环境细菌分离物的特征描述中,结果在每个菌株中鉴定出了 300-500 种蛋白质。大多数鉴定的途径映射到核心过程,以及希瓦氏菌特有的过程,特别是 c 型细胞色素的存在。核心功能类别的示例包括能量代谢、蛋白质和核苷酸合成以及辅酶生物合成,允许通过观察保守过程对细菌进行分类。此外,在这些核心功能中,我们观察到了与最近在希瓦氏菌中描述的替代乳酸利用途径有关的蛋白质。