Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland.
Elife. 2022 Jun 28;11:e76780. doi: 10.7554/eLife.76780.
Detecting factors associated with transmission is important to understand disease epidemics, and to design effective public health measures. Clustering and terminal branch lengths (TBL) analyses are commonly applied to genomic data sets of (MTB) to identify sub-populations with increased transmission. Here, I used a simulation-based approach to investigate what epidemiological processes influence the results of clustering and TBL analyses, and whether differences in transmission can be detected with these methods. I simulated MTB epidemics with different dynamics (latency, infectious period, transmission rate, basic reproductive number R0, sampling proportion, sampling period, and molecular clock), and found that all considered factors, except for the length of the infectious period, affect the results of clustering and TBL distributions. I show that standard interpretations of this type of analyses ignore two main caveats: (1) clustering results and TBL depend on many factors that have nothing to do with transmission, (2) clustering results and TBL do not tell anything about whether the epidemic is stable, growing, or shrinking, unless all the additional parameters that influence these metrics are known, or assumed identical between sub-populations. An important consequence is that the optimal SNP threshold for clustering depends on the epidemiological conditions, and that sub-populations with different epidemiological characteristics should not be analyzed with the same threshold. Finally, these results suggest that different clustering rates and TBL distributions, that are found consistently between different MTB lineages, are probably due to intrinsic bacterial factors, and do not indicate necessarily differences in transmission or evolutionary success.
检测与传播相关的因素对于理解疾病流行并设计有效的公共卫生措施非常重要。聚类和末端分支长度(TBL)分析常用于结核分枝杆菌(MTB)的基因组数据集,以识别具有更高传播性的亚群。在这里,我使用基于模拟的方法来研究哪些流行病学过程会影响聚类和 TBL 分析的结果,以及这些方法是否可以检测到不同的传播差异。我模拟了具有不同动力学(潜伏期、传染期、传播率、基本繁殖数 R0、采样比例、采样期和分子钟)的 MTB 流行情况,发现除传染期长度外,所有考虑的因素都会影响聚类和 TBL 分布的结果。我表明,这种类型的分析的标准解释忽略了两个主要的注意事项:(1)聚类结果和 TBL 取决于许多与传播无关的因素;(2)聚类结果和 TBL 并不能说明流行是否稳定、增长或收缩,除非所有影响这些指标的其他参数已知,或者假设亚群之间相同。一个重要的后果是,聚类的最佳 SNP 阈值取决于流行病学条件,具有不同流行病学特征的亚群不应使用相同的阈值进行分析。最后,这些结果表明,在不同的结核分枝杆菌谱系之间始终发现的不同聚类率和 TBL 分布可能是由于内在的细菌因素,不一定表示传播或进化成功的差异。