Mi Tianyu, Xiao Nan, Gong Haipeng
MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China.
Beijing Frontier Research Center for Biological Structure, Tsinghua University, Beijing, China.
Protein Sci. 2025 Feb;34(2):e70041. doi: 10.1002/pro.70041.
An important step of mainstream protein structure prediction is to model the 3D protein structure based on the predicted 2D inter-residue geometric information. This folding step has been integrated into a unified neural network to allow end-to-end training in state-of-the-art methods like AlphaFold2, but is separately implemented using the Rosetta folding environment in some traditional methods like trRosetta. Despite the inferiority in prediction accuracy, the conventional approach allows for the sampling of various protein conformations compatible with the predicted geometric constraints, partially capturing the dynamic information. Here, we propose GDFold2, a novel protein folding environment, to address the limitations of Rosetta. On the one hand, GDFold2 is highly computationally efficient, capable of accomplishing multiple folding processes in parallel within the time scale of minutes for generic proteins. On the other hand, GDFold2 supports freely defined objective functions to fulfill diversified optimization requirements. Moreover, we propose a quality assessment (QA) model to provide reliable prediction on the quality of protein structures folded by GDFold2, thus substantially simplifying the selection of structural models. GDFold2 and the QA model could be combined to investigate the transition path between protein conformational states, and the online server is available at https://structpred.life.tsinghua.edu.cn/server_gdfold2.html.
主流蛋白质结构预测的一个重要步骤是基于预测的二维残基间几何信息对三维蛋白质结构进行建模。在诸如AlphaFold2等最先进的方法中,这个折叠步骤已被集成到一个统一的神经网络中,以实现端到端训练,但在一些传统方法(如trRosetta)中,它是使用Rosetta折叠环境单独实现的。尽管在预测准确性方面存在劣势,但传统方法允许对与预测几何约束兼容的各种蛋白质构象进行采样,部分捕捉了动态信息。在此,我们提出了一种新颖的蛋白质折叠环境GDFold2,以解决Rosetta的局限性。一方面,GDFold2具有高度的计算效率,能够在几分钟的时间尺度内对通用蛋白质并行完成多个折叠过程。另一方面,GDFold2支持自由定义目标函数以满足多样化的优化需求。此外,我们提出了一种质量评估(QA)模型,以对GDFold2折叠的蛋白质结构质量提供可靠的预测,从而大大简化了结构模型的选择。GDFold2和QA模型可以结合起来研究蛋白质构象状态之间的转变路径,在线服务器可在https://structpred.life.tsinghua.edu.cn/server_gdfold2.html上获取。