Department of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY, United States of America.
Division of Biostatistics, College of Public Health, The Ohio State University, Columbus, OH, United States of America.
PLoS One. 2023 Sep 14;18(9):e0291271. doi: 10.1371/journal.pone.0291271. eCollection 2023.
Study of the genome of the SARS-CoV-2 virus, particularly with regard to understanding evolution of the virus, is crucial for managing the COVID-19 pandemic. To this end, we sample viral genomes from the GISAID repository and use several of the maximum likelihood approaches implemented in PAML, a collection of open source programs for phylogenetic analyses of DNA and protein sequences, to assess evidence for positive selection in the protein-coding regions of the SARS-CoV-2 genome. Across all major variants identified by June 2021, we find limited evidence for positive selection. In particular, we identify positive selection in a small proportion of sites (5-15%) in the protein-coding region of the spike protein across variants. Most other variants did not show a strong signal for positive selection overall, though there were indications of positive selection in the Delta and Kappa variants for the nucleocapsid protein. We additionally use a forward selection procedure to fit a model that allows branch-specific estimates of selection along a phylogeny relating the variants, and find that there is variation in the selective pressure across variants for the spike protein. Our results highlight the utility of computational approaches for identifying genomic regions under selection.
研究 SARS-CoV-2 病毒的基因组,特别是了解病毒的进化,对于管理 COVID-19 大流行至关重要。为此,我们从 GISAID 存储库中采样病毒基因组,并使用 PAML 中实现的几种最大似然方法,这是一组用于 DNA 和蛋白质序列系统发育分析的开源程序,来评估 SARS-CoV-2 基因组蛋白编码区中阳性选择的证据。在 2021 年 6 月之前确定的所有主要变体中,我们发现阳性选择的证据有限。特别是,我们在跨变体的刺突蛋白蛋白编码区中发现了一小部分(5-15%)位点的阳性选择。虽然在核衣壳蛋白中,Delta 和 Kappa 变体有阳性选择的迹象,但大多数其他变体总体上没有表现出强烈的阳性选择信号。我们还使用正向选择程序拟合了一个模型,该模型允许在与变体相关的系统发育上对分支特异性选择进行估计,并发现刺突蛋白的变体之间的选择压力存在差异。我们的研究结果强调了计算方法在识别受选择影响的基因组区域方面的实用性。