Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States.
Bioinformatics. 2023 May 4;39(5). doi: 10.1093/bioinformatics/btad298.
The state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph.
The method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score-the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement.
The source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.
最先进的蛋白质结构预测方法,如 AlphaFold,正被广泛应用于生物医学研究中预测未被描述的蛋白质的结构。为了提高预测结构的质量和自然度,增强其可用性,进一步改进这些方法具有重要意义。在这项工作中,我们开发了一种基于深度学习的、端到端的、全原子蛋白质结构模型细化方法 ATOMRefine。它使用 SE(3)等变图变换网络直接细化由分子图表示的预测三级结构中的蛋白质原子坐标。
该方法首先在结构模型上进行训练和测试,这些结构模型的实验结构是已知的,然后在 69 个 CASP14 常规目标和 7 个 CASP14 细化目标上进行盲目测试。ATOMRefine 提高了由 AlphaFold 生成的初始结构模型的骨架原子和全原子构象的质量。在基于全原子接触、键长、原子碰撞、扭转角和侧链构象分析的 MolProbity 评分等多个评估指标上,它的表现也优于两种最先进的细化方法。由于 ATOMRefine 可以快速细化蛋白质结构,因此它通过直接坐标细化提供了一种可行的、快速的解决方案,用于改善预测结构的几何形状和修复结构错误。
ATOMRefine 的源代码可在 GitHub 存储库(https://github.com/BioinfoMachineLearning/ATOMRefine)中获得。训练和测试所需的所有数据均可在 https://doi.org/10.5281/zenodo.6944368 上获得。