Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany.
Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany.
Genome Biol. 2024 Aug 22;25(1):228. doi: 10.1186/s13059-024-03355-y.
The emergence of the SARS-CoV-2 virus has highlighted the importance of genomic epidemiology in understanding the evolution of pathogens and guiding public health interventions. The Omicron variant in particular has underscored the role of epistasis in the evolution of lineages with both higher infectivity and immune escape, and therefore the necessity to update surveillance pipelines to detect them early on.
In this study, we apply a method based on mutual information between positions in a multiple sequence alignment, which is capable of scaling up to millions of samples. We show how it can reliably predict known experimentally validated epistatic interactions, even when using as little as 10,000 sequences, which opens the possibility of making it a near real-time prediction system. We test this possibility by modifying the method to account for the sample collection date and apply it retrospectively to multiple sequence alignments for each month between March 2020 and March 2023. We detected a cornerstone epistatic interaction in the Spike protein between codons 498 and 501 as soon as seven samples with a double mutation were present in the dataset, thus demonstrating the method's sensitivity. We test the ability of the method to make inferences about emerging interactions by testing candidates predicted after March 2023, which we validate experimentally.
We show how known epistatic interaction in SARS-CoV-2 can be detected with high sensitivity, and how emerging ones can be quickly prioritized for experimental validation, an approach that could be implemented downstream of pandemic genome sequencing efforts.
SARS-CoV-2 病毒的出现凸显了基因组流行病学在理解病原体进化和指导公共卫生干预措施方面的重要性。特别是奥密克戎变体突出了上位性在具有更高传染性和免疫逃逸的谱系进化中的作用,因此有必要更新监测管道,以便尽早发现它们。
在这项研究中,我们应用了一种基于多重序列比对中位置之间互信息的方法,该方法能够扩展到数百万个样本。我们展示了即使使用少至 10,000 个序列,它如何能够可靠地预测已知的实验验证的上位性相互作用,这为使其成为近实时预测系统提供了可能性。我们通过修改该方法以考虑样本采集日期并将其应用于 2020 年 3 月至 2023 年 3 月之间每个月的多个序列比对来测试这种可能性。我们检测到 Spike 蛋白中密码子 498 和 501 之间的一个基石上位性相互作用,一旦数据集中有 7 个具有双突变的样本,就立即证明了该方法的敏感性。我们通过测试 2023 年 3 月之后预测的候选物来测试该方法对新出现的相互作用进行推断的能力,我们通过实验验证了这些候选物。
我们展示了如何用高灵敏度检测 SARS-CoV-2 中的已知上位性相互作用,以及如何快速优先对新出现的相互作用进行实验验证,这种方法可以在大流行基因组测序工作之后实施。