Analytics & Data Science Directorate, UK Health Security Agency, London SW1P 3JR, UK.
Department of Infectious Disease, Imperial College London, London SW7 2AZ, UK.
Microb Genom. 2023 Apr;9(4). doi: 10.1099/mgen.0.000933.
Wastewater-based epidemiology has been used extensively throughout the COVID-19 (coronavirus disease 19) pandemic to detect and monitor the spread and prevalence of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) and its variants. It has proven an excellent, complementary tool to clinical sequencing, supporting the insights gained and helping to make informed public-health decisions. Consequently, many groups globally have developed bioinformatics pipelines to analyse sequencing data from wastewater. Accurate calling of mutations is critical in this process and in the assignment of circulating variants; yet, to date, the performance of variant-calling algorithms in wastewater samples has not been investigated. To address this, we compared the performance of six variant callers (VarScan, iVar, GATK, FreeBayes, LoFreq and BCFtools), used widely in bioinformatics pipelines, on 19 synthetic samples with known ratios of three different SARS-CoV-2 variants of concern (VOCs) (Alpha, Beta and Delta), as well as 13 wastewater samples collected in London between the 15th and 18th December 2021. We used the fundamental parameters of recall (sensitivity) and precision (specificity) to confirm the presence of mutational profiles defining specific variants across the six variant callers. Our results show that BCFtools, FreeBayes and VarScan found the expected variants with higher precision and recall than GATK or iVar, although the latter identified more expected defining mutations than other callers. LoFreq gave the least reliable results due to the high number of false-positive mutations detected, resulting in lower precision. Similar results were obtained for both the synthetic and wastewater samples.
基于污水的流行病学在整个 COVID-19(冠状病毒病 19)大流行期间被广泛用于检测和监测 SARS-CoV-2(严重急性呼吸系统综合征冠状病毒 2)及其变体的传播和流行。它已被证明是临床测序的出色补充工具,支持了获得的见解,并有助于做出明智的公共卫生决策。因此,全球许多团体已经开发了生物信息学管道来分析污水中的测序数据。在这个过程中,准确识别突变对于循环变体的分配至关重要;然而,迄今为止,变体调用算法在污水样本中的性能尚未得到研究。为了解决这个问题,我们比较了六种变体调用器(VarScan、iVar、GATK、FreeBayes、LoFreq 和 BCFtools)在 19 个具有已知三种不同 SARS-CoV-2 关注变体(VOCs)(Alpha、Beta 和 Delta)比例的合成样本中的性能,以及 2021 年 12 月 15 日至 18 日在伦敦收集的 13 个污水样本。我们使用召回率(敏感性)和精度(特异性)的基本参数来确认存在定义特定变体的突变特征。我们的结果表明,BCFtools、FreeBayes 和 VarScan 比 GATK 或 iVar 更准确和更精确地发现了预期的变体,尽管后者比其他调用器识别出了更多预期的定义突变。由于检测到大量假阳性突变,LoFreq 的结果最不可靠,导致精度较低。对于合成和污水样本,都得到了类似的结果。