Beijing Institute of Biotechnology, State Key Laboratory of Pathogen and Biosecurity, Beijing, China.
Front Cell Infect Microbiol. 2023 Apr 19;13:1161445. doi: 10.3389/fcimb.2023.1161445. eCollection 2023.
Driven by various mutations on the viral Spike protein, diverse variants of SARS-CoV-2 have emerged and prevailed repeatedly, significantly prolonging the pandemic. This phenomenon necessitates the identification of key Spike mutations for fitness enhancement. To address the need, this manuscript formulates a well-defined framework of causal inference methods for evaluating and identifying key Spike mutations to the viral fitness of SARS-CoV-2. In the context of large-scale genomes of SARS-CoV-2, it estimates the statistical contribution of mutations to viral fitness across lineages and therefore identifies important mutations. Further, identified key mutations are validated by computational methods to possess functional effects, including Spike stability, receptor-binding affinity, and potential for immune escape. Based on the effect score of each mutation, individual key fitness-enhancing mutations such as D614G and T478K are identified and studied. From individual mutations to protein domains, this paper recognizes key protein regions on the Spike protein, including the receptor-binding domain and the N-terminal domain. This research even makes further efforts to investigate viral fitness mutational effect scores, allowing us to compute the fitness score of different SARS-CoV-2 strains and predict their transmission capacity based solely on their viral sequence. This prediction of viral fitness has been validated using BA.2.12.1, which is not used for regression training but well fits the prediction. To the best of our knowledge, this is the first research to apply causal inference models to mutational analysis on large-scale genomes of SARS-CoV-2. Our findings produce innovative and systematic insights into SARS-CoV-2 and promotes functional studies of its key mutations, serving as reliable guidance about mutations of interest.
受病毒刺突蛋白上各种突变的驱动,SARS-CoV-2 的多种变体不断出现并反复流行,显著延长了大流行时间。这种现象需要确定对病毒适应性增强的关键刺突突变。为了解决这一需求,本文提出了一个明确的因果推理方法框架,用于评估和识别对 SARS-CoV-2 病毒适应性至关重要的刺突突变。在 SARS-CoV-2 的大规模基因组背景下,该框架估计了突变对病毒适应性的统计贡献,从而确定了重要的突变。此外,通过计算方法验证了鉴定出的关键突变具有功能效应,包括刺突稳定性、受体结合亲和力和潜在的免疫逃逸能力。基于每个突变的效应评分,确定并研究了个体关键的适应性增强突变,如 D614G 和 T478K。从个体突变到蛋白质结构域,本文识别了刺突蛋白上的关键蛋白质区域,包括受体结合域和 N 端结构域。该研究甚至进一步努力研究病毒适应性突变效应评分,使得我们可以仅根据病毒序列计算不同 SARS-CoV-2 毒株的适应性评分,并预测它们的传播能力。使用 BA.2.12.1 对病毒适应性的预测进行了验证,BA.2.12.1 未用于回归训练,但很好地符合预测。据我们所知,这是首次将因果推理模型应用于 SARS-CoV-2 大规模基因组中的突变分析。我们的研究结果为 SARS-CoV-2 提供了创新性和系统性的见解,并促进了其关键突变的功能研究,为感兴趣的突变提供了可靠的指导。