Department of Microbiology, Medical School, National and Kapodistrian University of Athens , Athens, Greece.
ELGO-Demeter, Plant Protection Division of Patras, Laboratory of Virology , Patras, Greece.
mSystems. 2023 Aug 31;8(4):e0044023. doi: 10.1128/msystems.00440-23. Epub 2023 Jul 11.
Amino acids in variable positions of proteins may be correlated, with potential structural and functional implications. Here, we apply exact tests of independence in R × C contingency tables to examine noise-free associations between variable positions of the SARS-CoV-2 spike protein, using as a paradigm sequences from Greece deposited in GISAID ( = 6,683/1,078 full length) for the period 29 February 2020 to 26 April 2021 that essentially covers the first three pandemic waves. We examine the fate and complexity of these associations by network analysis, using associated positions (exact ≤ 0.001 and Average Product Correction ≥ 2) as links and the corresponding positions as nodes. We found a temporal linear increase of positional differences and a gradual expansion of the number of position associations over time, represented by a temporally evolving intricate web, resulting in a non-random complex network of 69 nodes and 252 links. Overconnected nodes corresponded to the most adapted variant positions in the population, suggesting a direct relation between network degree and position functional importance. Modular analysis revealed 25 -cliques comprising 3 to 11 nodes. At different -clique resolutions, one to four communities were formed, capturing epistatic associations of circulating variants (Alpha, Beta, B.1.1.318), but also Delta, which dominated the evolutionary landscape later in the pandemic. Cliques of aminoacidic positional associations tended to occur in single sequences, enabling the recognition of epistatic positions in real-world virus populations. Our findings provide a novel way of understanding epistatic relationships in viral proteins with potential applications in the design of virus control procedures. IMPORTANCE Paired positional associations of adapted amino acids in virus proteins may provide new insights for understanding virus evolution and variant formation. We investigated potential intramolecular relationships between variable SARS-CoV-2 spike positions by exact tests of independence in R × C contingency tables, having applied Average Product Correction (APC) to eliminate background noise. Associated positions (exact ≤ 0.001 and APC ≥ 2) formed a non-random, epistatic network of 25 cliques and 1-4 communities at different clique resolutions, revealing evolutionary ties between variable positions of circulating variants and a predictive potential of previously unknown network positions. Cliques of different sizes represented theoretical combinations of changing residues in sequence space, allowing the identification of significant aminoacidic combinations in single sequences of real-world populations. Our analytic approach that links network structural aspects to mutational aminoacidic combinations in the spike sequence population offers a novel way to understand virus epidemiology and evolution.
位置可变的蛋白质中的氨基酸可能存在相关性,具有潜在的结构和功能意义。在这里,我们应用 R×C 列联表中的精确独立性检验,使用在 GISAID 中储存的来自希腊的 SARS-CoV-2 刺突蛋白的序列(=6683/1078 全长)作为范例,研究 2020 年 2 月 29 日至 2021 年 4 月 26 日期间无噪声的可变位置之间的关联,这实质上涵盖了前三个大流行波。我们通过网络分析来研究这些关联的命运和复杂性,使用相关位置(确切值≤0.001 和平均乘积校正≥2)作为链接,对应的位置作为节点。我们发现,随着时间的推移,位置差异呈线性增加,随着时间的推移,位置关联的数量逐渐增加,表现为一个时间演变的复杂网络,形成了一个由 69 个节点和 252 个链接组成的非随机复杂网络。过连接的节点对应于群体中最适应的变体位置,表明网络度和位置功能重要性之间存在直接关系。模块化分析显示,25 个 cliques 由 3 到 11 个节点组成。在不同的 clique 分辨率下,形成了一个到四个社区,捕获了循环变体(Alpha、Beta、B.1.1.318)的上位关联,但也捕获了 Delta,后者在大流行后期主导了进化格局。氨基酸位置关联的 clique 往往发生在单个序列中,使得在现实世界的病毒群体中识别上位位置成为可能。我们的研究结果为理解病毒蛋白中的上位关系提供了一种新方法,具有病毒控制程序设计的潜在应用。
病毒蛋白中适应性氨基酸的配对位置关联可能为理解病毒进化和变体形成提供新的见解。我们通过 R×C 列联表中的精确独立性检验,应用平均乘积校正(APC)消除背景噪声,研究了 SARS-CoV-2 刺突位置之间潜在的分子内关系。相关位置(确切值≤0.001 和 APC≥2)形成了一个非随机的、上位的网络,其中包含 25 个 clique 和 1-4 个社区,在不同的 clique 分辨率下,揭示了循环变体之间的进化联系以及网络位置的预测潜力,这些网络位置之前是未知的。不同大小的 clique 代表了序列空间中变化残基的理论组合,允许在真实世界群体的单个序列中识别重要的氨基酸组合。我们的分析方法将网络结构方面与刺突序列群体中的突变氨基酸组合联系起来,为理解病毒流行病学和进化提供了一种新方法。