Centre for Health Informatics, University of New South Wales, Sydney, Australia.
PLoS One. 2011 Apr 4;6(4):e17964. doi: 10.1371/journal.pone.0017964.
The phylogenetic profile of a gene is a reflection of its evolutionary history and can be defined as the differential presence or absence of a gene in a set of reference genomes. It has been employed to facilitate the prediction of gene functions. However, the hypothesis that the application of this concept can also facilitate the discovery of bacterial virulence factors has not been fully examined. In this paper, we test this hypothesis and report a computational pipeline designed to identify previously unknown bacterial virulence genes using group B streptococcus (GBS) as an example. Phylogenetic profiles of all GBS genes across 467 bacterial reference genomes were determined by candidate-against-all BLAST searches,which were then used to identify candidate virulence genes by machine learning models. Evaluation experiments with known GBS virulence genes suggested good functional and model consistency in cross-validation analyses (areas under ROC curve, 0.80 and 0.98 respectively). Inspection of the top-10 genes in each of the 15 virulence functional groups revealed at least 15 (of 119) homologous genes implicated in virulence in other human pathogens but previously unrecognized as potential virulence genes in GBS. Among these highly-ranked genes, many encode hypothetical proteins with possible roles in GBS virulence. Thus, our approach has led to the identification of a set of genes potentially affecting the virulence potential of GBS, which are potential candidates for further in vitro and in vivo investigations. This computational pipeline can also be extended to in silico analysis of virulence determinants of other bacterial pathogens.
基因的系统发生谱是其进化历史的反映,可以定义为在一组参考基因组中基因的差异存在或不存在。它已被用于促进基因功能的预测。然而,应用这一概念也有助于发现细菌毒力因子的假设尚未得到充分检验。在本文中,我们检验了这一假设,并报告了一个计算流程,该流程旨在使用 B 群链球菌(GBS)作为示例来识别以前未知的细菌毒力基因。通过候选物与所有 BLAST 搜索确定了所有 GBS 基因在 467 个细菌参考基因组中的系统发生谱,然后通过机器学习模型来识别候选毒力基因。使用已知的 GBS 毒力基因进行的评估实验表明,在交叉验证分析中具有良好的功能和模型一致性(ROC 曲线下面积分别为 0.80 和 0.98)。在 15 个毒力功能组中的每个组的前 10 个基因的检查中,发现了至少 15 个(119 个中的 15 个)与其他人类病原体的毒力相关的同源基因,但以前未被认为是 GBS 中潜在的毒力基因。在这些排名较高的基因中,许多编码假定的蛋白质,这些蛋白质可能在 GBS 毒力中发挥作用。因此,我们的方法导致了一组可能影响 GBS 毒力潜力的基因的鉴定,这些基因是进一步进行体外和体内研究的潜在候选基因。此计算流程还可以扩展到其他细菌病原体的毒力决定因素的计算机分析。