Suppr超能文献

评估基于机器学习的全基因组测序数据抗生素药敏试验性能和可靠性的影响因素。

Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data.

机构信息

Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America.

Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

出版信息

PLoS Comput Biol. 2019 Sep 3;15(9):e1007349. doi: 10.1371/journal.pcbi.1007349. eCollection 2019 Sep.

Abstract

Prediction of antibiotic resistance phenotypes from whole genome sequencing data by machine learning methods has been proposed as a promising platform for the development of sequence-based diagnostics. However, there has been no systematic evaluation of factors that may influence performance of such models, how they might apply to and vary across clinical populations, and what the implications might be in the clinical setting. Here, we performed a meta-analysis of seven large Neisseria gonorrhoeae datasets, as well as Klebsiella pneumoniae and Acinetobacter baumannii datasets, with whole genome sequence data and antibiotic susceptibility phenotypes using set covering machine classification, random forest classification, and random forest regression models to predict resistance phenotypes from genotype. We demonstrate how model performance varies by drug, dataset, resistance metric, and species, reflecting the complexities of generating clinically relevant conclusions from machine learning-derived models. Our findings underscore the importance of incorporating relevant biological and epidemiological knowledge into model design and assessment and suggest that doing so can inform tailored modeling for individual drugs, pathogens, and clinical populations. We further suggest that continued comprehensive sampling and incorporation of up-to-date whole genome sequence data, resistance phenotypes, and treatment outcome data into model training will be crucial to the clinical utility and sustainability of machine learning-based molecular diagnostics.

摘要

基于机器学习方法从全基因组测序数据预测抗生素耐药表型已被提议作为开发基于序列的诊断方法的一个有前途的平台。然而,目前还没有系统评估可能影响此类模型性能的因素、它们如何适用于不同的临床人群并在临床环境中产生变化,以及可能产生的影响。在这里,我们使用覆盖集机器分类、随机森林分类和随机森林回归模型,对来自淋病奈瑟菌以及肺炎克雷伯菌和鲍曼不动杆菌的全基因组序列数据和抗生素药敏表型的七个大型数据集进行了荟萃分析,以预测基因型的耐药表型。我们展示了模型性能如何因药物、数据集、耐药指标和物种而异,反映了从机器学习衍生模型中得出临床相关结论的复杂性。我们的研究结果强调了将相关的生物学和流行病学知识纳入模型设计和评估的重要性,并表明这样做可以为个体药物、病原体和临床人群提供有针对性的建模。我们进一步建议,继续全面采样并将最新的全基因组序列数据、耐药表型和治疗结果数据纳入模型训练,对于基于机器学习的分子诊断的临床实用性和可持续性至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04ac/6743791/59826f5d2641/pcbi.1007349.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验