追踪废水中的 SARS-CoV-2 关切变异株:使用模拟基因组数据评估九种计算工具。

Tracking SARS-CoV-2 variants of concern in wastewater: an assessment of nine computational tools using simulated genomic data.

机构信息

Department of Microbiology and Immunology, McGill University, Montreal, QC, Canada.

Environment and Climate Change Canada, Montreal, QC, Canada.

出版信息

Microb Genom. 2024 May;10(5). doi: 10.1099/mgen.0.001249.

Abstract

Wastewater-based surveillance (WBS) is an important epidemiological and public health tool for tracking pathogens across the scale of a building, neighbourhood, city, or region. WBS gained widespread adoption globally during the SARS-CoV-2 pandemic for estimating community infection levels by qPCR. Sequencing pathogen genes or genomes from wastewater adds information about pathogen genetic diversity, which can be used to identify viral lineages (including variants of concern) that are circulating in a local population. Capturing the genetic diversity by WBS sequencing is not trivial, as wastewater samples often contain a diverse mixture of viral lineages with real mutations and sequencing errors, which must be deconvoluted computationally from short sequencing reads. In this study we assess nine different computational tools that have recently been developed to address this challenge. We simulated 100 wastewater sequence samples consisting of SARS-CoV-2 BA.1, BA.2, and Delta lineages, in various mixtures, as well as a Delta-Omicron recombinant and a synthetic 'novel' lineage. Most tools performed well in identifying the true lineages present and estimating their relative abundances and were generally robust to variation in sequencing depth and read length. While many tools identified lineages present down to 1 % frequency, results were more reliable above a 5 % threshold. The presence of an unknown synthetic lineage, which represents an unclassified SARS-CoV-2 lineage, increases the error in relative abundance estimates of other lineages, but the magnitude of this effect was small for most tools. The tools also varied in how they labelled novel synthetic lineages and recombinants. While our simulated dataset represents just one of many possible use cases for these methods, we hope it helps users understand potential sources of error or bias in wastewater sequencing analysis and to appreciate the commonalities and differences across methods.

摘要

污水监测(Wastewater-based surveillance,WBS)是追踪建筑物、社区、城市或地区范围内病原体的一种重要的流行病学和公共卫生工具。在 SARS-CoV-2 大流行期间,WBS 通过 qPCR 估计社区感染水平,在全球范围内得到广泛应用。从污水中测序病原体的基因或基因组可以提供有关病原体遗传多样性的信息,这可以用于识别在当地人群中传播的病毒谱系(包括关注变体)。通过 WBS 测序捕获遗传多样性并非易事,因为污水样本通常包含多种具有真实突变和测序错误的病毒谱系,这些必须通过计算从短测序reads 中解卷积。在这项研究中,我们评估了最近开发的九种不同的计算工具来解决这一挑战。我们模拟了 100 个由 SARS-CoV-2 BA.1、BA.2 和 Delta 谱系组成的污水序列样本,以各种混合物存在,以及一个 Delta-Omicron 重组体和一个合成的“新型”谱系。大多数工具在识别存在的真实谱系及其相对丰度方面表现良好,并且通常对测序深度和读取长度的变化具有鲁棒性。虽然许多工具可以识别频率低至 1%的谱系,但在 5%的阈值以上结果更可靠。存在未知的合成谱系,代表一种未分类的 SARS-CoV-2 谱系,会增加其他谱系相对丰度估计的误差,但对于大多数工具来说,这种影响很小。这些工具在如何标记新型合成谱系和重组体方面也存在差异。虽然我们的模拟数据集仅代表这些方法的许多可能用例之一,但我们希望它可以帮助用户了解污水测序分析中的潜在误差或偏差源,并了解方法之间的异同。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9792/11165662/67ec63dee9ab/mgen-10-01249-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索