Nuffield Department of Clinical Medicine, University of Oxford, John Radcliffe Hospital, Oxford, United Kingdom.
PLoS Comput Biol. 2013;9(5):e1003059. doi: 10.1371/journal.pcbi.1003059. Epub 2013 May 2.
Bacterial whole genome sequencing offers the prospect of rapid and high precision investigation of infectious disease outbreaks. Close genetic relationships between microorganisms isolated from different infected cases suggest transmission is a strong possibility, whereas transmission between cases with genetically distinct bacterial isolates can be excluded. However, undetected mixed infections-infection with ≥2 unrelated strains of the same species where only one is sequenced-potentially impairs exclusion of transmission with certainty, and may therefore limit the utility of this technique. We investigated the problem by developing a computationally efficient method for detecting mixed infection without the need for resource-intensive independent sequencing of multiple bacterial colonies. Given the relatively low density of single nucleotide polymorphisms within bacterial sequence data, direct reconstruction of mixed infection haplotypes from current short-read sequence data is not consistently possible. We therefore use a two-step maximum likelihood-based approach, assuming each sample contains up to two infecting strains. We jointly estimate the proportion of the infection arising from the dominant and minor strains, and the sequence divergence between these strains. In cases where mixed infection is confirmed, the dominant and minor haplotypes are then matched to a database of previously sequenced local isolates. We demonstrate the performance of our algorithm with in silico and in vitro mixed infection experiments, and apply it to transmission of an important healthcare-associated pathogen, Clostridium difficile. Using hospital ward movement data in a previously described stochastic transmission model, 15 pairs of cases enriched for likely transmission events associated with mixed infection were selected. Our method identified four previously undetected mixed infections, and a previously undetected transmission event, but no direct transmission between the pairs of cases under investigation. These results demonstrate that mixed infections can be detected without additional sequencing effort, and this will be important in assessing the extent of cryptic transmission in our hospitals.
细菌全基因组测序为快速、高精度调查传染病爆发提供了可能。从不同感染病例中分离出的微生物之间存在密切的遗传关系,这表明传播的可能性很大,而具有不同遗传细菌分离物的病例之间的传播则可以排除。然而,未被检测到的混合感染——同一物种中感染≥2 种不相关的菌株,而仅对其中一种进行测序——可能会导致无法确定地排除传播的可能性,因此可能会限制该技术的应用。我们通过开发一种无需对多个细菌菌落进行资源密集型独立测序即可检测混合感染的计算效率方法来研究这个问题。鉴于细菌序列数据中单核苷酸多态性的相对密度较低,直接从当前短读序列数据重建混合感染单倍型是不一致的。因此,我们采用了两步最大似然法,假设每个样本中最多存在两种感染菌株。我们共同估计来自主要和次要菌株的感染比例,以及这些菌株之间的序列差异。在确认存在混合感染的情况下,然后将主要和次要单倍型与先前测序的本地分离株数据库进行匹配。我们通过计算机模拟和体外混合感染实验展示了我们算法的性能,并将其应用于一种重要的医疗机构相关病原体艰难梭菌的传播。使用先前描述的随机传播模型中的医院病房移动数据,选择了 15 对可能与混合感染相关的、富集了可能传播事件的病例。我们的方法发现了四个先前未被检测到的混合感染和一个先前未被检测到的传播事件,但在被调查的病例对之间没有直接传播。这些结果表明,可以在不进行额外测序的情况下检测到混合感染,这对于评估我们医院中隐匿性传播的程度非常重要。