Ortiz Arturo Torres, Kendall Michelle, Storey Nathaniel, Hatcher James, Dunn Helen, Roy Sunando, Williams Rachel, Williams Charlotte, Goldstein Richard A, Didelot Xavier, Harris Kathryn, Breuer Judith, Grandjean Louis
Department of Infectious Diseases, Imperial College London, London, W2 1NY.
Department of Statistics, University of Warwick, Coventry, CV4 7AL.
bioRxiv. 2022 Jun 7:2022.06.07.495142. doi: 10.1101/2022.06.07.495142.
Accurate inference of who infected whom in an infectious disease outbreak is critical for the delivery of effective infection prevention and control. The increased resolution of pathogen whole-genome sequencing has significantly improved our ability to infer transmission events. Despite this, transmission inference often remains limited by the lack of genomic variation between the source case and infected contacts. Although within-host genetic diversity is common among a wide variety of pathogens, conventional whole-genome sequencing phylogenetic approaches to reconstruct outbreaks exclusively use consensus sequences, which consider only the most prevalent nucleotide at each position and therefore fail to capture low frequency variation within samples. We hypothesized that including within-sample variation in a phylogenetic model would help to identify who infected whom in instances in which this was previously impossible. Using whole-genome sequences from SARS-CoV-2 multi-institutional outbreaks as an example, we show how within-sample diversity is stable among repeated serial samples from the same host, is transmitted between those cases with known epidemiological links, and how this improves phylogenetic inference and our understanding of who infected whom. Our technique is applicable to other infectious diseases and has immediate clinical utility in infection prevention and control.
在传染病爆发中准确推断谁感染了谁对于实施有效的感染预防和控制至关重要。病原体全基因组测序分辨率的提高显著增强了我们推断传播事件的能力。尽管如此,传播推断往往仍受到源病例与受感染接触者之间缺乏基因组变异的限制。虽然宿主内遗传多样性在多种病原体中很常见,但传统的用于重建疫情爆发的全基因组测序系统发育方法仅使用一致性序列,该序列仅考虑每个位置最常见的核苷酸,因此无法捕捉样本内的低频变异。我们推测,在系统发育模型中纳入样本内变异将有助于在以前无法做到的情况下识别谁感染了谁。以来自SARS-CoV-2多机构疫情爆发的全基因组序列为例,我们展示了样本内多样性在来自同一宿主的重复系列样本中是如何稳定的,在具有已知流行病学联系的病例之间是如何传播的,以及这如何改善系统发育推断和我们对谁感染了谁的理解。我们的技术适用于其他传染病,在感染预防和控制方面具有直接的临床应用价值。