Suppr超能文献

基于基因组机器学习分类的高性能源归因。

High performance source attribution using genomics-based machine learning classification.

机构信息

Department of Microbiology and Immunology, Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia.

Center for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia.

出版信息

Appl Environ Microbiol. 2024 Mar 20;90(3):e0129223. doi: 10.1128/aem.01292-23. Epub 2024 Jan 30.

Abstract

Fundamental to effective Legionnaires' disease outbreak control is the ability to rapidly identify the environmental source(s) of the causative agent, . Genomics has revolutionized pathogen surveillance, but has a complex ecology and population structure that can limit source inference based on standard core genome phylogenetics. Here, we present a powerful machine learning approach that assigns the geographical source of Legionnaires' disease outbreaks more accurately than current core genome comparisons. Models were developed upon 534 . genome sequences, including 149 genomes linked to 20 previously reported Legionnaires' disease outbreaks through detailed case investigations. Our classification models were developed in a cross-validation framework using only environmental genomes. Assignments of clinical isolate geographic origins demonstrated high predictive sensitivity and specificity of the models, with no false positives or false negatives for 13 out of 20 outbreak groups, despite the presence of within-outbreak polyclonal population structure. Analysis of the same 534-genome panel with a conventional phylogenomic tree and a core genome multi-locus sequence type allelic distance-based classification approach revealed that our machine learning method had the highest overall classification performance-agreement with epidemiological information. Our multivariate statistical learning approach maximizes the use of genomic variation data and is thus well-suited for supporting Legionnaires' disease outbreak investigations.IMPORTANCEIdentifying the sources of Legionnaires' disease outbreaks is crucial for effective control. Current genomic methods, while useful, often fall short due to the complex ecology and population structure of , the causative agent. Our study introduces a high-performing machine learning approach for more accurate geographical source attribution of Legionnaires' disease outbreaks. Developed using cross-validation on environmental genomes, our models demonstrate excellent predictive sensitivity and specificity. Importantly, this new approach outperforms traditional methods like phylogenomic trees and core genome multi-locus sequence typing, proving more efficient at leveraging genomic variation data to infer outbreak sources. Our machine learning algorithms, harnessing both core and accessory genomic variation, offer significant promise in public health settings. By enabling rapid and precise source identification in Legionnaires' disease outbreaks, such approaches have the potential to expedite intervention efforts and curtail disease transmission.

摘要

有效控制军团病疫情的基础是快速识别病原体的环境来源。基因组学彻底改变了病原体监测,但军团菌的生态和种群结构复杂,这可能会限制基于标准核心基因组系统发育学的来源推断。在这里,我们提出了一种强大的机器学习方法,该方法比当前的核心基因组比较更能准确地分配军团病疫情的地理来源。该模型是在包括 149 个基因组的 534 个基因组序列的基础上建立的,这些基因组与通过详细案例调查与 20 次以前报告的军团病疫情有关。我们的分类模型是在仅使用环境基因组的交叉验证框架中开发的。临床分离株地理起源的分配表明,该模型具有很高的预测敏感性和特异性,对于 20 个疫情组中的 13 个,没有假阳性或假阴性,尽管存在疫情内多克隆种群结构。使用传统的系统发育树和核心基因组多位点序列类型等位基因距离分类方法对相同的 534 个基因组面板进行分析表明,我们的机器学习方法具有最高的总体分类性能-与流行病学信息一致。我们的多元统计学习方法最大限度地利用了基因组变异数据,因此非常适合支持军团病疫情调查。

重要性

确定军团病疫情的来源对于有效控制至关重要。虽然当前的基因组方法很有用,但由于军团菌的复杂生态和种群结构,往往不够准确,军团菌是这种疾病的病原体。我们的研究引入了一种高性能的机器学习方法,用于更准确地归因军团病疫情的地理来源。我们的模型使用环境基因组的交叉验证进行开发,具有出色的预测敏感性和特异性。重要的是,这种新方法优于传统方法,如系统发育树和核心基因组多位点序列类型,在利用基因组变异数据推断疫情来源方面更有效。我们的机器学习算法利用核心和辅助基因组变异,在公共卫生领域具有很大的应用前景。通过在军团病疫情中实现快速而精确的来源识别,这种方法有可能加快干预措施并遏制疾病传播。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47b3/10952463/211ec83f3cce/aem.01292-23.f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验