Laboratoire de Physique de l'Ecole Normale Supérieure, PSL & CNRS UMR8063, Sorbonne Université, Université de Paris, Paris, France.
School of Molecular Sciences and Center for Molecular Design and Biomimetics, The Biodesign Institute, Arizona State University, Tempe, AZ, USA.
Mol Biol Evol. 2021 May 19;38(6):2428-2445. doi: 10.1093/molbev/msab036.
COVID-19 can lead to acute respiratory syndrome, which can be due to dysregulated immune signaling. We analyze the distribution of CpG dinucleotides, a pathogen-associated molecular pattern, in the SARS-CoV-2 genome. We characterize CpG content by a CpG force that accounts for statistical constraints acting on the genome at the nucleotidic and amino acid levels. The CpG force, as the CpG content, is overall low compared with other pathogenic betacoronaviruses; however, it widely fluctuates along the genome, with a particularly low value, comparable with the circulating seasonal HKU1, in the spike coding region and a greater value, comparable with SARS and MERS, in the highly expressed nucleocapside coding region (N ORF), whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the 3'UTRs of all subgenomic RNA. This dual nature of CpG content could confer to SARS-CoV-2 the ability to avoid triggering pattern recognition receptors upon entry, while eliciting a stronger response during replication. We then investigate the evolution of synonymous mutations since the outbreak of the COVID-19 pandemic, finding a signature of CpG loss in regions with a greater CpG force. Sequence motifs preceding the CpG-loss-associated loci in the N ORF match recently identified binding patterns of the zinc finger antiviral protein. Using a model of the viral gene evolution under human host pressure, we find that synonymous mutations seem driven in the SARS-CoV-2 genome, and particularly in the N ORF, by the viral codon bias, the transition-transversion bias, and the pressure to lower CpG content.
新冠病毒(COVID-19)可引发急性呼吸综合征,这可能是由于免疫信号失调所致。我们分析了 SARS-CoV-2 基因组中胞嘧啶-磷酸-鸟嘌呤二核苷酸(CpG 二核苷酸)的分布情况。我们用 CpG 力来描述 CpG 含量,CpG 力考虑了核苷酸和氨基酸水平上基因组的统计约束。与其他致病性贝塔冠状病毒相比,SARS-CoV-2 的 CpG 力和 CpG 含量总体较低;然而,CpG 含量沿基因组广泛波动,在编码刺突蛋白的区域中,CpG 含量特别低,与季节性 HKU1 相当,而在高度表达的核衣壳编码区(N 基因 ORF)中,CpG 含量则较高,与 SARS 和 MERS 相当,N 基因 ORF 的转录本在感染细胞的细胞质中相对丰富,并存在于所有亚基因组 RNA 的 3'UTR 中。这种 CpG 含量的双重性质可能使 SARS-CoV-2 在进入时能够避免触发模式识别受体,而在复制过程中引发更强的反应。然后,我们研究了自 COVID-19 大流行爆发以来同义突变的进化,发现 N 基因 ORF 中 CpG 力较大区域的 CpG 丢失具有特征。N 基因 ORF 中与 CpG 丢失相关的序列基序与最近鉴定的锌指抗病毒蛋白结合模式相匹配。使用人类宿主压力下病毒基因进化的模型,我们发现同义突变似乎是由 SARS-CoV-2 基因组中的病毒密码子偏向性、转换-颠换偏向性以及降低 CpG 含量的压力所驱动的,尤其是在 N 基因 ORF 中。