系统分析监督机器学习是预测肺炎链球菌β-内酰胺类药物耐药表型的有效方法。

Systematic analysis of supervised machine learning as an effective approach to predicate β-lactam resistance phenotype in Streptococcus pneumoniae.

机构信息

State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China.

Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University 60 Nanyang Drive, Singapore, Singapore.

出版信息

Brief Bioinform. 2020 Jul 15;21(4):1347-1355. doi: 10.1093/bib/bbz056.

DOI:10.1093/bib/bbz056

PMID:31192359

Abstract

Streptococcus pneumoniae is the most common human respiratory pathogen, and β-lactam antibiotics have been employed to treat infections caused by S. pneumoniae for decades. β-lactam resistance is steadily increasing in pneumococci and is mainly associated with the alteration in penicillin-binding proteins (PBPs) that reduce binding affinity of antibiotics to PBPs. However, the high variability of PBPs in clinical isolates and their mosaic gene structure hamper the predication of resistance level according to the PBP gene sequences. In this study, we developed a systematic strategy for applying supervised machine learning to predict S. pneumoniae antimicrobial susceptibility to β-lactam antibiotics. We combined published PBP sequences with minimum inhibitory concentration (MIC) values as labelled data and the sequences from NCBI database without MIC values as unlabelled data to develop an approach, using only a fragment from pbp2x (750 bp) and a fragment from pbp2b (750 bp) to predicate the cefuroxime and amoxicillin resistance. We further validated the performance of the supervised learning model by constructing mutants containing the randomly selected pbps and testing more clinical strains isolated from Chinese hospital. In addition, we established the association between resistance phenotypes and serotypes and sequence type of S. pneumoniae using our approach, which facilitate the understanding of the worldwide epidemiology of S. pneumonia.

摘要

肺炎链球菌是最常见的人类呼吸道病原体，几十年来一直使用β-内酰胺类抗生素治疗由肺炎链球菌引起的感染。肺炎链球菌中的β-内酰胺类抗生素耐药性正在稳步上升，主要与青霉素结合蛋白（PBPs）的改变有关，这种改变降低了抗生素与 PBPs 的结合亲和力。然而，临床分离株中 PBPs 的高度变异性及其镶嵌基因结构阻碍了根据 PBP 基因序列预测耐药水平。在本研究中，我们开发了一种系统的策略，将监督机器学习应用于预测肺炎链球菌对β-内酰胺类抗生素的药敏性。我们将已发表的 PBP 序列与最低抑菌浓度（MIC）值作为标记数据，以及 NCBI 数据库中没有 MIC 值的序列作为未标记数据进行组合，开发了一种仅使用 pbp2x（750bp）的片段和 pbp2b（750bp）的片段来预测头孢呋辛和阿莫西林耐药性的方法。我们进一步通过构建含有随机选择的 pbps 的突变体并测试更多来自中国医院的临床分离株来验证有监督学习模型的性能。此外，我们还使用我们的方法建立了肺炎链球菌耐药表型与血清型和序列型之间的关联，这有助于了解肺炎链球菌在全球的流行病学。