Unidad de Bioinformática, Institut Pasteur Montevideo, Montevideo, Uruguay.
PLoS One. 2012;7(8):e42144. doi: 10.1371/journal.pone.0042144. Epub 2012 Aug 6.
Although there have been great advances in understanding bacterial pathogenesis, there is still a lack of integrative information about what makes a bacterium a human pathogen. The advent of high-throughput sequencing technologies has dramatically increased the amount of completed bacterial genomes, for both known human pathogenic and non-pathogenic strains; this information is now available to investigate genetic features that determine pathogenic phenotypes in bacteria. In this work we determined presence/absence patterns of 814 different virulence-related genes among more than 600 finished bacterial genomes from both human pathogenic and non-pathogenic strains, belonging to different taxonomic groups (i.e: Actinobacteria, Gammaproteobacteria, Firmicutes, etc.). An accuracy of 95% using a cross-fold validation scheme with in-fold feature selection is obtained when classifying human pathogens and non-pathogens. A reduced subset of highly informative genes (120) is presented and applied to an external validation set. The statistical model was implemented in the BacFier v1.0 software (freely available at http : ==bacfier:googlecode:com=files=Bacfier v1 0:zip), that displays not only the prediction (pathogen/non-pathogen) and an associated probability for pathogenicity, but also the presence/absence vector for the analyzed genes, so it is possible to decipher the subset of virulence genes responsible for the classification on the analyzed genome. Furthermore, we discuss the biological relevance for bacterial pathogenesis of the core set of genes, corresponding to eight functional categories, all with evident and documented association with the phenotypes of interest. Also, we analyze which functional categories of virulence genes were more distinctive for pathogenicity in each taxonomic group, which seems to be a completely new kind of information and could lead to important evolutionary conclusions.
尽管在理解细菌发病机制方面已经取得了重大进展,但对于什么使细菌成为人类病原体,仍然缺乏综合信息。高通量测序技术的出现极大地增加了已完成的细菌基因组数量,包括已知的人类病原体和非病原体菌株;现在可以利用这些信息来研究决定细菌致病表型的遗传特征。在这项工作中,我们确定了 600 多个来自人类病原体和非病原体菌株的不同分类群(即放线菌、γ-变形菌、厚壁菌门等)的 814 种不同毒力相关基因的存在/缺失模式。使用交叉折叠验证方案和内折叠特征选择对人类病原体和非病原体进行分类时,可获得 95%的准确率。提出并应用于外部验证集的是一组信息量较大的基因(120 个)。该统计模型已在 BacFier v1.0 软件(可在 http:// == bacfier:googlecode:com/files/Bacfier v1.0:zip 免费获得)中实现,该软件不仅显示预测(病原体/非病原体)和相关的致病性概率,还显示分析基因的存在/缺失向量,因此可以破译负责对分析基因组进行分类的毒力基因子集。此外,我们还讨论了核心基因集(对应于八个功能类别)与细菌发病机制的生物学相关性,所有这些都与感兴趣的表型有明显的和有记录的关联。我们还分析了每个分类群中哪些毒力基因的功能类别对致病性更具特征性,这似乎是一种全新的信息,可能会导致重要的进化结论。