The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia.
Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
Protein Sci. 2024 Aug;33(8):e5096. doi: 10.1002/pro.5096.
Nuclear magnetic resonance (NMR) crystallography is one of the main methods in structural biology for analyzing protein stereochemistry and structure. The chemical shift of the resonance frequency reflects the effect of the protons in a molecule producing distinct NMR signals in different chemical environments. Apprehending chemical shifts from NMR signals can be challenging since having an NMR structure does not necessarily provide all the required chemical shift information, making predictive models essential for accurately deducing chemical shifts, either from protein structures or, more ideally, directly from amino acid sequences. Here, we present EFG-CS, a web server that specializes in chemical shift prediction. EFG-CS employs a machine learning-based transfer prediction model for backbone atom chemical shift prediction, using ESMFold-predicted protein structures. Additionally, ESG-CS incorporates a graph neural network-based model to provide comprehensive side-chain atom chemical shift predictions. Our method demonstrated reliable performance in backbone atom prediction, achieving comparable accuracy levels with root mean square errors (RMSE) of 0.30 ppm for H, 0.22 ppm for Hα, 0.89 ppm for C, 0.89 ppm for Cα, 0.84 ppm for Cβ, and 1.69 ppm for N. Moreover, our approach also showed predictive capabilities in side-chain atom chemical shift prediction achieving RMSE values of 0.71 ppm for Hβ, 0.74-1.15 ppm for Hδ, and 0.58-0.94 ppm for Hγ, solely utilizing amino acid sequences without homology or feature curation. This work shows for the first time that generative AI protein models can predict NMR shifts nearly comparable to experimental models. This web server is freely available at https://biosig.lab.uq.edu.au/efg_cs, and the chemical shift prediction results can be downloaded in tabular format and visualized in 3D format.
核磁共振(NMR)晶体学是结构生物学中分析蛋白质立体化学和结构的主要方法之一。共振频率的化学位移反映了分子中质子的影响,导致在不同化学环境中产生不同的 NMR 信号。从 NMR 信号中理解化学位移可能具有挑战性,因为 NMR 结构并不一定提供所有必需的化学位移信息,因此预测模型对于准确推断化学位移至关重要,无论是从蛋白质结构推断,还是更理想的情况下,直接从氨基酸序列推断。在这里,我们介绍了 EFG-CS,这是一个专门从事化学位移预测的网络服务器。EFG-CS 使用基于机器学习的转移预测模型,使用 ESMFold 预测的蛋白质结构进行预测 backbone 原子化学位移。此外,ESG-CS 还包含一个基于图神经网络的模型,提供全面的侧链原子化学位移预测。我们的方法在 backbone 原子预测中表现出可靠的性能,达到了可比的精度水平,均方根误差(RMSE)为 H 为 0.30ppm,Hα 为 0.22ppm,C 为 0.89ppm,Cα 为 0.89ppm,Cβ 为 0.84ppm,N 为 1.69ppm。此外,我们的方法在侧链原子化学位移预测方面也具有预测能力,仅使用氨基酸序列,无需同源性或特征处理,Hβ 的 RMSE 值为 0.71ppm,Hδ 的 RMSE 值为 0.74-1.15ppm,Hγ 的 RMSE 值为 0.58-0.94ppm。这项工作首次表明,生成式 AI 蛋白质模型可以预测几乎与实验模型相当的 NMR 位移。这个网络服务器可以在 https://biosig.lab.uq.edu.au/efg_cs 上免费获得,化学位移预测结果可以以表格格式下载,并以 3D 格式可视化。