Suppr超能文献

基于多位点序列分型方案的随机森林算法分析临床肺炎克雷伯菌菌株核心基因组等位基因谱的超毒力。

Core Genome Allelic Profiles of Clinical Klebsiella pneumoniae Strains Using a Random Forest Algorithm Based on Multilocus Sequence Typing Scheme for Hypervirulence Analysis.

机构信息

Department of Infectious Diseases, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.

Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, China.

出版信息

J Infect Dis. 2020 Mar 16;221(Suppl 2):S263-S271. doi: 10.1093/infdis/jiz562.

Abstract

BACKGROUND

Hypervirulent Klebsiella pneumoniae (hvKP) infections can have high morbidity and mortality rates owing to their invasiveness and virulence. However, there are no effective tools or biomarkers to discriminate between hvKP and nonhypervirulent K. pneumoniae (nhvKP) strains. We aimed to use a random forest algorithm to predict hvKP based on core-genome data.

METHODS

In total, 272 K. pneumoniae strains were collected from 20 tertiary hospitals in China and divided into hvKP and nhvKP groups according to clinical criteria. Clinical data comparisons, whole-genome sequencing, virulence profile analysis, and core genome multilocus sequence typing (cgMLST) were performed. We then established a random forest predictive model based on the cgMLST scheme to prospectively identify hvKP. The random forest is an ensemble learning method that generates multiple decision trees during the training process and each decision tree will output its own prediction results corresponding to the input. The predictive ability of the model was assessed by means of area under the receiver operating characteristic curve.

RESULTS

Patients in the hvKP group were younger than those in the nhvKP group (median age, 58.0 and 68.0 years, respectively; P < .001). More patients in the hvKP group had underlying diabetes mellitus (43.1% vs 20.1%; P < .001). Clinically, carbapenem-resistant K. pneumoniae was less common in the hvKP group (4.1% vs 63.8%; P < .001), whereas the K1/K2 serotype, sequence type (ST) 23, and positive string tests were significantly higher in the hvKP group. A cgMLST-based minimal spanning tree revealed that hvKP strains were scattered sporadically within nhvKP clusters. ST23 showed greater genome diversification than did ST11, according to cgMLST-based allelic differences. Primary virulence factors (rmpA, iucA, positive string test result, and the presence of virulence plasmid pLVPK) were poor predictors of the hypervirulence phenotype. The random forest model based on the core genome allelic profile presented excellent predictive power, both in the training and validating sets (area under receiver operating characteristic curve, 0.987 and 0.999 in the training and validating sets, respectively).

CONCLUSIONS

A random forest algorithm predictive model based on the core genome allelic profiles of K. pneumoniae was accurate to identify the hypervirulent isolates.

摘要

背景

由于高侵袭性和毒性,高毒力肺炎克雷伯菌(hvKP)感染可导致高发病率和死亡率。然而,目前尚无有效的工具或生物标志物来区分 hvKP 和非高毒力肺炎克雷伯菌(nhvKP)菌株。本研究旨在使用随机森林算法基于核心基因组数据预测 hvKP。

方法

共收集来自中国 20 家三级医院的 272 株肺炎克雷伯菌,根据临床标准分为 hvKP 和 nhvKP 组。进行临床数据比较、全基因组测序、毒力谱分析和核心基因组多位点序列分型(cgMLST)。然后,我们基于 cgMLST 方案建立了一个随机森林预测模型,用于前瞻性识别 hvKP。随机森林是一种在训练过程中生成多个决策树的集成学习方法,每个决策树将根据输入输出自己的预测结果。通过接收者操作特征曲线下的面积评估模型的预测能力。

结果

hvKP 组患者的年龄小于 nhvKP 组(中位数年龄分别为 58.0 岁和 68.0 岁;P<0.001)。hvKP 组更多的患者患有糖尿病(43.1%比 20.1%;P<0.001)。临床结果显示,碳青霉烯类耐药肺炎克雷伯菌在 hvKP 组较少见(4.1%比 63.8%;P<0.001),而 K1/K2 血清型、ST23 型和阳性 string 试验在 hvKP 组显著更高。基于 cgMLST 的最小生成树显示 hvKP 菌株在 nhvKP 簇内散在分布。根据 cgMLST 基于等位基因差异的结果,ST23 显示出比 ST11 更大的基因组多样化。主要毒力因子(rmpA、iucA、阳性 string 试验结果和携带毒力质粒 pLVPK)是预测高毒力表型的不良指标。基于核心基因组等位基因谱的随机森林模型在训练集和验证集均表现出优异的预测能力(训练集和验证集的受试者工作特征曲线下面积分别为 0.987 和 0.999)。

结论

基于肺炎克雷伯菌核心基因组等位基因谱的随机森林算法预测模型能够准确识别高毒力分离株。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验