Department of Physics of Complex Systems, Eötvös Loránd University, Budapest, Hungary.
Wigner Research Centre for Physics, 1121, Budapest, Hungary.
Sci Data. 2023 Mar 14;10(1):134. doi: 10.1038/s41597-023-02035-z.
Leveraging recent advances in computational modeling of proteins with AlphaFold2 (AF2) we provide a complete curated data set of all single mutations from each of the 7 main SARS-CoV-2 lineages spike protein receptor binding domain (RBD) resulting in 3819X7 = 26733 PDB structures. We visualize the generated structures and show that AF2 pLDDT values are correlated with state-of-the-art disorder approximations, implying some internal protein dynamics are also captured by the model. Joint increasing mutational coverage of both structural and phenotype data coupled with advances in machine learning can be leveraged to accelerate virology research, specifically future variant prediction. We hope this data release can offer assistance into further understanding of the local and global mutational landscape of SARS-CoV-2 as well as provide insight into the biological understanding that 3D structure acts as a bridge between protein genotype and phenotype.
利用 AlphaFold2 (AF2) 对蛋白质进行计算建模的最新进展,我们提供了一个完整的 SARS-CoV-2 刺突蛋白受体结合域 (RBD) 中每个主要谱系的所有单点突变的经过精心整理的数据集,共产生了 3819X7=26733 个 PDB 结构。我们可视化生成的结构,并表明 AF2 的 pLDDT 值与最先进的无序近似值相关,这意味着模型还捕获了一些内部蛋白质动力学。结合结构和表型数据的突变覆盖度的不断提高以及机器学习的进步,可以加速病毒学研究,特别是未来的变体预测。我们希望这个数据发布能够帮助进一步了解 SARS-CoV-2 的局部和全局突变景观,并深入了解 3D 结构作为蛋白质基因型和表型之间桥梁的生物学理解。