Laboratory for Computational Biology & Biomolecular Design, School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi 221005, Uttar Pradesh, India.
Phys Chem Chem Phys. 2024 May 8;26(18):14046-14061. doi: 10.1039/d4cp01014k.
The COVID-19 pandemic, driven by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), necessitates a profound understanding of the virus and its lifecycle. As an RNA virus with high mutation rates, SARS-CoV-2 exhibits genetic variability leading to the emergence of variants with potential implications. Among its key proteins, the RNA-dependent RNA polymerase (RdRp) is pivotal for viral replication. Notably, RdRp forms dimers non-structural protein (nsp) subunits, particularly nsp7, crucial for efficient viral RNA copying. Similar to the main protease (M) of SARS-CoV-2, there is a possibility that the nsp7 might also undergo mutational selection events to generate more stable and adaptable versions of nsp7 dimer during virus evolution. However, efforts to obtain such cohesive and comprehensive information are lacking. To address this, we performed this study focused on deciphering the molecular intricacies of nsp7 dimerization using a multifaceted approach. Leveraging computational protein design (CPD), machine learning (ML), AlphaFold v2.0-based structural analysis, and several related computational approaches, we aimed to identify critical residues and mutations influencing nsp7 dimer stability and adaptation. Our methodology involved identifying potential hotspot residues within the dimeric nsp7 interface using an interface-based CPD approach. Through Rosetta-based symmetrical protein design, we designed and modulated nsp7 dimerization, considering selected interface residues. Analysis of physicochemical features revealed acceptable structural changes and several structural and residue-specific insights emphasizing the intricate nature of such protein-protein complexes. Our ML models, particularly the random forest regressor (RFR), accurately predicted binding affinities and ML-guided sequence predictions corroborated CPD findings, elucidating potential nsp7 mutations and their impact on binding affinity. Validation against clinical sequencing data demonstrated the predictive accuracy of our approach. Moreover, AlphaFold v2.0 structural analyses validated optimal dimeric configurations of affinity-enhancing designs, affirming methodological precision. Affinity-enhancing designs exhibited favourable energetics and higher binding affinity as compared to their counterparts. The obtained physicochemical properties, molecular interactions, and sequence predictions advance our understanding of SARS-CoV-2 evolution and inform potential avenues for therapeutic intervention against COVID-19.
新型冠状病毒肺炎(COVID-19)是由严重急性呼吸系统综合征冠状病毒 2(SARS-CoV-2)引起的,这需要深入了解该病毒及其生命周期。SARS-CoV-2 是一种具有高突变率的 RNA 病毒,表现出遗传变异性,导致具有潜在影响的变体出现。在其关键蛋白中,RNA 依赖性 RNA 聚合酶(RdRp)是病毒复制的关键。值得注意的是,RdRp 形成非结构蛋白(nsp)亚基的二聚体,特别是 nsp7,这对于有效的病毒 RNA 复制至关重要。与 SARS-CoV-2 的主要蛋白酶(M)类似,nsp7 也有可能发生突变选择事件,以在病毒进化过程中产生更稳定和适应性更强的 nsp7 二聚体版本。然而,获取这种连贯和全面信息的努力还很缺乏。为了解决这个问题,我们使用多方面的方法进行了这项研究,旨在破译 nsp7 二聚化的分子复杂性。利用计算蛋白质设计(CPD)、机器学习(ML)、基于 AlphaFold v2.0 的结构分析和几种相关的计算方法,我们旨在确定影响 nsp7 二聚体稳定性和适应性的关键残基和突变。我们的方法包括使用基于界面的 CPD 方法识别二聚 nsp7 界面中的潜在热点残基。通过基于 Rosetta 的对称蛋白质设计,我们设计和调节了 nsp7 二聚化,考虑了选定的界面残基。物理化学特征分析揭示了可接受的结构变化和几个结构和残基特异性的见解,强调了这种蛋白质-蛋白质复合物的复杂性质。我们的 ML 模型,特别是随机森林回归器(RFR),准确地预测了结合亲和力,ML 指导的序列预测也证实了 CPD 的发现,阐明了潜在的 nsp7 突变及其对结合亲和力的影响。对临床测序数据的验证证明了我们方法的预测准确性。此外,AlphaFold v2.0 结构分析验证了增强亲和力设计的最佳二聚体构型,证实了方法的精确性。与增强亲和力的设计相比,增强亲和力的设计表现出有利的能量和更高的结合亲和力。获得的物理化学性质、分子相互作用和序列预测,增进了我们对 SARS-CoV-2 进化的理解,并为针对 COVID-19 的治疗干预提供了潜在途径。