Suppr超能文献

机器学习揭示了辅助序列在疫情聚集性中的动态重要性。

Machine learning reveals the dynamic importance of accessory sequences for outbreak clustering.

作者信息

Liu Chao Chun, Hsiao William W L

机构信息

Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada.

Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada.

出版信息

mBio. 2025 Mar 12;16(3):e0265024. doi: 10.1128/mbio.02650-24. Epub 2025 Jan 28.

Abstract

UNLABELLED

Bacterial typing at whole-genome scales is now feasible owing to decreasing costs in high-throughput sequencing and the recent advances in computation. The unprecedented resolution of whole-genome typing is achieved by genotyping the variable segments of bacterial genomes that can fluctuate significantly in gene content. However, due to the transient and hypervariable nature of many accessory elements, the value of the added resolution in outbreak investigations remains disputed. To assess the analytical value of bacterial accessory genomes in clustering epidemiologically related cases, we trained classifiers on a set of genomes collected from 24 outbreaks of food, animal, or environmental origin. The models demonstrated high precision and recall on unseen test data with near-perfect accuracy in classifying clonal and short-term outbreaks. Annotating the genomic features important for cluster classification revealed functional enrichment of molecular fingerprints in genes involved in membrane transportation, trafficking, and carbohydrate metabolism. Importantly, we discovered polymorphisms in mobile genetic elements (MGEs) and gain/loss of MGEs to be informative in defining outbreak clusters. To quantify the ability of MGE variations to cluster outbreak clones, we devised a reference-free tree-building algorithm inspired by colored de Bruijn graphs, which enabled topological comparisons between MGE and standard typing methods. Systematic evaluation of clustering MGEs on an unseen dataset of 34 outbreaks yielded mixed results that exemplified the power of accessory sequence variations when core genomes of unrelated cases are insufficiently discriminatory, as well as the distortion of outbreak signals by microevolution events or the incomplete assembly of MGEs.

IMPORTANCE

Gene-by-gene typing is widely used to detect clusters of foodborne illnesses that share a common origin. It remains actively debated whether the inclusion of accessory sequences in bacterial typing schema is informative or deleterious for cluster definitions in outbreak investigations due to the potential confounding effects of horizontal gene transfer. By training machine learning models on a curated set of historical outbreaks, we revealed an enriched presence of outbreak distinguishing features in a wide range of mobile genetic elements. Systematic comparison of the efficacy of clustering different accessory elements against standard sequence typing methods led to our cataloging of scenarios where accessory sequence variations were beneficial and uninformative to resolving outbreak clusters. The presented work underscores the complexity of the molecular trends in enteric outbreaks and seeks to inspire novel computational ways to exploit whole-genome sequencing data in enteric disease surveillance and management.

摘要

未标注

由于高通量测序成本的降低以及计算技术的最新进展,全基因组规模的细菌分型现在变得可行。全基因组分型通过对细菌基因组中可变片段进行基因分型来实现前所未有的分辨率,这些可变片段的基因含量可能会有显著波动。然而,由于许多辅助元件具有瞬时性和高度可变性,在疫情调查中增加的分辨率的价值仍存在争议。为了评估细菌辅助基因组在聚集流行病学相关病例中的分析价值,我们在一组从24起食物、动物或环境来源的疫情中收集的基因组上训练分类器。这些模型在对未见过的测试数据进行分类时表现出高精度和召回率,在对克隆性和短期疫情进行分类时准确率近乎完美。对聚类分类重要的基因组特征进行注释,揭示了参与膜运输、 trafficking和碳水化合物代谢的基因中分子指纹的功能富集。重要的是,我们发现移动遗传元件(MGEs)中的多态性以及MGEs的获得/丢失在定义疫情聚类时具有信息价值。为了量化MGE变异对疫情克隆进行聚类的能力,我们设计了一种受彩色德布鲁因图启发的无参考建树算法,该算法能够对MGE和标准分型方法进行拓扑比较。在一个包含34起疫情的未见过的数据集上对MGE聚类进行系统评估,结果喜忧参半,这体现了在无关病例的核心基因组缺乏足够区分能力时辅助序列变异的作用,以及微进化事件或MGEs组装不完整对疫情信号的扭曲。

重要性

逐个基因分型被广泛用于检测具有共同起源的食源性疾病聚类。由于水平基因转移的潜在混杂效应,在细菌分型方案中纳入辅助序列对于疫情调查中的聚类定义是有益还是有害,仍存在激烈争论。通过在一组精心挑选的历史疫情上训练机器学习模型,我们揭示了在广泛的移动遗传元件中存在丰富的疫情区分特征。系统比较不同辅助元件聚类与标准序列分型方法的效果,使我们能够梳理出辅助序列变异对解决疫情聚类有益和无信息价值的情况。所呈现的工作强调了肠道疫情中分子趋势的复杂性,并试图激发新的计算方法来利用全基因组测序数据进行肠道疾病监测和管理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b595/11898705/7a763b9c7d9f/mbio.02650-24.f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验