Walter Katharine S, Cohen Ted, Mathema Barun, Colijn Caroline, Sobkowiak Benjamin, Comas Iñaki, Goig Galo A, Croda Julio, Andrews Jason R
Division of Epidemiology, University of Utah, Salt Lake City, UT, USA.
Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT, USA.
Lancet Microbe. 2025 Jan;6(1):100936. doi: 10.1016/j.lanmic.2024.06.003. Epub 2024 Nov 28.
Mycobacterium tuberculosis complex (MTBC) species evolve slowly, so isolates from individuals linked in transmission often have identical or nearly identical genomes, making it difficult to reconstruct transmission chains. Finding additional sources of shared MTBC variation could help overcome this problem. Previous studies have reported MTBC diversity within infected individuals; however, whether within-host variation improves transmission inferences remains unclear. Here, we aimed to quantify within-host MTBC variation and assess whether such information improves transmission inferences.
We conducted a retrospective genomic epidemiology study in which we reanalysed publicly available sequence data from household transmission studies published in PubMed from database inception until Jan 31, 2024, for which both genomic and epidemiological contact data were available, using household membership as a proxy for transmission linkage. We quantified minority variants (ie, positions with two or more alleles each supported by at least five-fold coverage and with a minor allele frequency of 1% or more) outside of PE and PPE genes, within individual samples and shared across samples. We used receiver operator characteristic (ROC) curves to compare the performance of a general linear model for household membership that included shared minority variants and one that included only fixed genetic differences.
We identified three MTBC household transmission studies with publicly available whole-genome sequencing data and epidemiological linkages: a household transmission study in Vitória, Brazil (Colangeli et al), a retrospective population-based study of paediatric tuberculosis in British Columbia, Canada (Guthrie et al), and a retrospective population-based study in Oxfordshire, England (Walker et al). We found moderate levels of minority variation present in MTBC sequence data from cultured isolates that varied significantly across studies: mean 168·6 minority variants (95% CI 151·4-185·9) for the Colangeli et al dataset, 5·8 (1·5-10·2) for Guthrie et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al), and 7·1 (2·4-11·9) for Walker et al (p<0·0001, Wilcoxon rank sum test, vs Colangeli et al). Isolates from household pairs shared more minority variants than did randomly selected pairs of isolates: mean 97·7 shared minority variants (79·1-116·3) versus 9·8 (8·6-11·0) in Colangeli et al, 0·8 (0·1-1·5) versus 0·2 (0·1-0·2) in Guthrie et al, and 0·7 (0·1-1·3) versus 0·2 (0·2-0·2) in Walker et al (all p<0·0001, Wilcoxon rank sum test). Shared within-host variation was significantly associated with household membership (odds ratio 1·51 [95% CI 1·30-1·71], p<0·0001), for one standard deviation increase in shared minority variants. Models that included shared within-host variation versus models without within-host variation improved the accuracy of predicting household membership in all three studies: area under the ROC curve 0·95 versus 0·92 for the Colangeli et al study, 0·99 versus 0·95 for the Guthrie et al study, and 0·93 versus 0·91 for the Walker et al study.
Within-host MTBC variation persists through culture of sputum and could enhance the resolution of transmission inferences. The substantial differences in minority variation recovered across studies highlight the need to optimise approaches to recover and incorporate within-host variation into automated phylogenetic and transmission inference.
National Institutes of Health.
结核分枝杆菌复合群(MTBC)物种进化缓慢,因此来自传播相关个体的分离株通常具有相同或几乎相同的基因组,这使得重建传播链变得困难。寻找MTBC共享变异的其他来源可能有助于克服这一问题。以往研究报道了感染个体内的MTBC多样性;然而,宿主内变异是否能改善传播推断仍不清楚。在此,我们旨在量化宿主内MTBC变异,并评估此类信息是否能改善传播推断。
我们进行了一项回顾性基因组流行病学研究,重新分析了自数据库建立至2024年1月31日在PubMed上发表的家庭传播研究中的公开可用序列数据,这些研究既有基因组数据又有流行病学接触数据,以家庭成员关系作为传播关联的代理。我们量化了个体样本中以及样本间共享的、位于PE和PPE基因之外的少数变异(即每个位置有两个或更多等位基因,每个等位基因至少有五倍覆盖且次要等位基因频率为1%或更高)。我们使用接受者操作特征(ROC)曲线比较了包含共享少数变异的家庭关系通用线性模型和仅包含固定遗传差异的模型的性能。
我们确定了三项有公开可用全基因组测序数据和流行病学关联的MTBC家庭传播研究:巴西维多利亚的一项家庭传播研究(科兰杰利等人)、加拿大不列颠哥伦比亚省基于人群的儿童结核病回顾性研究(格思里等人)以及英国牛津郡基于人群的回顾性研究(沃克等人)。我们发现培养分离株的MTBC序列数据中存在中等水平的少数变异,不同研究之间差异显著:科兰杰利等人的数据集中平均有168.6个少数变异(95%CI 151.4 - 185.9),格思里等人的数据集中为5.8个(1.5 - 10.2)(p<0.0001,威尔科克森秩和检验,与科兰杰利等人相比),沃克等人的数据集中为7.1个(2.4 - 11.9)(p<0.0001,威尔科克森秩和检验,与科兰杰利等人相比)。来自家庭配对的分离株比随机选择的分离株对共享更多的少数变异:科兰杰利等人的研究中平均共享97.7个少数变异(79.1 - 116.3),而随机配对为9.8个(8.6 - 11.0);格思里等人的研究中分别为0.8个(0.1 - 1.5)和0.2个(0.1 - 0.2);沃克等人的研究中分别为0.7个(0.1 - 1.3)和0.2个(0.2 - 0.2)(所有p<0.0001,威尔科克森秩和检验)。共享的宿主内变异与家庭成员关系显著相关(优势比1.51 [95%CI 1.30 - 1.71],p<0.0001),共享少数变异每增加一个标准差。在所有三项研究中,包含共享宿主内变异的模型与不包含宿主内变异的模型相比,提高了预测家庭成员关系的准确性:科兰杰利等人的研究中ROC曲线下面积为0.95对0.92,格思里等人的研究中为0.99对0.95,沃克等人的研究中为0.93对0.91。
宿主内MTBC变异在痰液培养过程中持续存在,并可提高传播推断的分辨率。不同研究中恢复的少数变异存在实质性差异,这突出表明需要优化方法以恢复宿主内变异并将其纳入自动化系统发育和传播推断中。
美国国立卫生研究院。