Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.
Department of Medicine, Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.
mBio. 2020 Aug 25;11(4):e01527-20. doi: 10.1128/mBio.01527-20.
Variation in the genome of , an important pathogen, can have dramatic impacts on the bacterium's ability to cause disease. We therefore asked whether it was possible to predict the virulence of isolates based on their genomic content. We applied a machine learning approach to a genetically and phenotypically diverse collection of 115 clinical isolates using genomic information and corresponding virulence phenotypes in a mouse model of bacteremia. We defined the accessory genome of these isolates through the presence or absence of accessory genomic elements (AGEs), sequences present in some strains but not others. Machine learning models trained using AGEs were predictive of virulence, with a mean nested cross-validation accuracy of 75% using the random forest algorithm. However, individual AGEs did not have a large influence on the algorithm's performance, suggesting instead that virulence predictions are derived from a diffuse genomic signature. These results were validated with an independent test set of 25 isolates whose virulence was predicted with 72% accuracy. Machine learning models trained using core genome single-nucleotide variants and whole-genome k-mers also predicted virulence. Our findings are a proof of concept for the use of bacterial genomes to predict pathogenicity in and highlight the potential of this approach for predicting patient outcomes. is a clinically important Gram-negative opportunistic pathogen. shows a large degree of genomic heterogeneity both through variation in sequences found throughout the species (core genome) and through the presence or absence of sequences in different isolates (accessory genome). isolates also differ markedly in their ability to cause disease. In this study, we used machine learning to predict the virulence level of isolates in a mouse bacteremia model based on genomic content. We show that both the accessory and core genomes are predictive of virulence. This study provides a machine learning framework to investigate relationships between bacterial genomes and complex phenotypes such as virulence.
一种重要病原体的基因组变异可能会对细菌引起疾病的能力产生巨大影响。因此,我们想知道是否可以根据基因组内容来预测 分离株的毒力。我们应用机器学习方法对 115 株临床 分离株进行了遗传和表型多样化的研究,这些分离株的基因组信息和相应的毒力表型均来自于细菌血症的小鼠模型。我们通过存在或不存在辅助基因组元件(AGEs)来定义这些分离株的辅助基因组,AGEs 是某些菌株中存在而其他菌株中不存在的序列。使用 AGEs 训练的机器学习模型可以预测毒力,使用随机森林算法的平均嵌套交叉验证准确率为 75%。但是,单个 AGE 对算法的性能没有很大的影响,这表明毒力预测是从弥散的基因组特征中得出的。我们使用 25 株独立测试分离株的独立测试集验证了这些结果,这些分离株的毒力预测准确率为 72%。使用核心基因组单核苷酸变异和全基因组 k- mers 训练的机器学习模型也可以预测毒力。我们的研究结果证明了使用细菌基因组预测 毒力的概念,突出了这种方法预测患者结局的潜力。 是一种临床重要的革兰氏阴性机会性病原体。 表现出很大程度的基因组异质性,既通过在整个物种(核心基因组)中发现的序列变化,也通过不同分离株中序列的存在或不存在来表现。 分离株在引起疾病的能力方面也有很大的差异。在这项研究中,我们使用机器学习方法根据基因组内容预测了 分离株在小鼠菌血症模型中的毒力水平。我们表明,辅助基因组和核心基因组都可以预测毒力。这项研究为研究细菌基因组与复杂表型(如毒力)之间的关系提供了机器学习框架。