Hodcroft Emma B, Wohlfender Martin S, Neher Richard A, Riou Julien, Althaus Christian L
Institute of Social and Preventive Medicine, University of Bern, Bern, Switzerland.
Multidisciplinary Center for Infectious Diseases, University of Bern, Bern, Switzerland.
PLoS Comput Biol. 2025 Apr 15;21(4):e1012960. doi: 10.1371/journal.pcbi.1012960. eCollection 2025 Apr.
The wealth of genomic data that was generated during the COVID-19 pandemic provides an exceptional opportunity to obtain information on the transmission of SARS-CoV-2. Specifically, there is great interest to better understand how the effective reproduction number [Formula: see text] and the overdispersion of secondary cases, which can be quantified by the negative binomial dispersion parameter k, changed over time and across regions and viral variants. The aim of our study was to develop a Bayesian framework to infer [Formula: see text] and k from viral sequence data. First, we developed a mathematical model for the distribution of the size of identical sequence clusters, in which we integrated viral transmission, the mutation rate of the virus, and incomplete case-detection. Second, we implemented this model within a Bayesian inference framework, allowing the estimation of [Formula: see text] and k from genomic data only. We validated this model in a simulation study. Third, we identified clusters of identical sequences in all SARS-CoV-2 sequences in 2021 from Switzerland, Denmark, and Germany that were available on GISAID. We obtained monthly estimates of the posterior distribution of [Formula: see text] and k, with the resulting [Formula: see text] estimates slightly lower than estimates obtained by other methods, and k comparable with previous results. We found comparatively higher estimates of k in Denmark which suggests less opportunities for superspreading and more controlled transmission compared to the other countries in 2021. Our model included an estimation of the case detection and sampling probability, but the estimates obtained had large uncertainty, reflecting the difficulty of estimating these parameters simultaneously. Our study presents a novel method to infer information on the transmission of infectious diseases and its heterogeneity using genomic data. With increasing availability of sequences of pathogens in the future, we expect that our method has the potential to provide new insights into the transmission and the overdispersion in secondary cases of other pathogens.
在新冠疫情期间产生的大量基因组数据为获取有关严重急性呼吸综合征冠状病毒2(SARS-CoV-2)传播的信息提供了一个绝佳机会。具体而言,人们非常有兴趣更好地了解有效繁殖数[公式:见正文]以及二代病例的过度离散情况(可通过负二项分布离散参数k进行量化)如何随时间、跨地区以及病毒变体而变化。我们研究的目的是开发一个贝叶斯框架,以便从病毒序列数据中推断出[公式:见正文]和k。首先,我们针对相同序列簇大小的分布建立了一个数学模型,在该模型中我们整合了病毒传播、病毒的突变率以及病例检测不完整的情况。其次,我们在贝叶斯推理框架内实现了这个模型,从而能够仅从基因组数据估计[公式:见正文]和k。我们在一项模拟研究中对这个模型进行了验证。第三,我们在全球共享流感数据倡议组织(GISAID)上可获取的2021年来自瑞士、丹麦和德国的所有SARS-CoV-2序列中识别出了相同序列的簇。我们获得了[公式:见正文]和k后验分布的月度估计值,所得的[公式:见正文]估计值略低于通过其他方法获得的估计值,而k与先前结果相当。我们发现丹麦的k估计值相对较高,这表明与2021年的其他国家相比,丹麦出现超级传播的机会较少,传播更受控制。我们的模型包括对病例检测和抽样概率的估计,但获得的估计值具有很大的不确定性,这反映了同时估计这些参数的难度。我们的研究提出了一种利用基因组数据推断传染病传播及其异质性信息的新方法。随着未来病原体序列的可得性不断增加,我们预计我们的方法有可能为其他病原体二代病例的传播和过度离散情况提供新的见解。