School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
BMC Biol. 2020 Sep 3;18(1):114. doi: 10.1186/s12915-020-00846-9.
Bacterial resistance to antibiotics is a growing health problem that is projected to cause more deaths than cancer by 2050. Consequently, novel antibiotics are urgently needed. Since more than half of the available antibiotics target the structurally conserved bacterial ribosomes, factors involved in protein synthesis are thus prime targets for the development of novel antibiotics. However, experimental identification of these potential antibiotic target proteins can be labor-intensive and challenging, as these proteins are likely to be poorly characterized and specific to few bacteria. Here, we use a bioinformatics approach to identify novel components of protein synthesis.
In order to identify these novel proteins, we established a Large-Scale Transcriptomic Analysis Pipeline in Crowd (LSTrAP-Crowd), where 285 individuals processed 26 terabytes of RNA-sequencing data of the 17 most notorious bacterial pathogens. In total, the crowd processed 26,269 RNA-seq experiments and used the data to construct gene co-expression networks, which were used to identify more than a hundred uncharacterized genes that were transcriptionally associated with protein synthesis. We provide the identity of these genes together with the processed gene expression data.
We identified genes related to protein synthesis in common bacterial pathogens and thus provide a resource of potential antibiotic development targets for experimental validation. The data can be used to explore additional vulnerabilities of bacteria, while our approach demonstrates how the processing of gene expression data can be easily crowd-sourced.
细菌对抗生素的耐药性是一个日益严重的健康问题,预计到 2050 年,其导致的死亡人数将超过癌症。因此,急需新型抗生素。由于目前已有一半以上的抗生素针对结构保守的细菌核糖体,因此蛋白质合成相关的因素是新型抗生素开发的主要目标。然而,这些潜在抗生素靶蛋白的实验鉴定可能既费力又具有挑战性,因为这些蛋白可能特征不明显且特异性差。在这里,我们使用生物信息学方法来鉴定新的蛋白质合成成分。
为了鉴定这些新蛋白,我们建立了一个大规模转录组分析管道 Crowd(LSTrAP-Crowd),其中 285 人处理了 17 种最臭名昭著的细菌病原体的 26TB RNA-seq 数据。总的来说,该人群处理了 26269 个 RNA-seq 实验,并使用这些数据构建了基因共表达网络,用于鉴定 100 多个与蛋白质合成转录相关的未表征基因。我们提供了这些基因的身份以及经过处理的基因表达数据。
我们在常见细菌病原体中鉴定了与蛋白质合成相关的基因,从而为实验验证提供了潜在抗生素开发目标的资源。该数据可用于探索细菌的其他弱点,而我们的方法则证明了如何轻松地将基因表达数据众包处理。