Department of Computer Science, Hanyang University, Seoul 04763, Korea.
Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea.
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad712.
Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures.
Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups.
DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
以高精度预测蛋白质结构是生命科学和工业界广大领域面临的一项关键挑战。尽管像 AlphaFold2 这样的深度神经网络取得了进展,但仍需要进一步提高侧链等详细结构以及蛋白质骨架结构的质量。
在 AlphaFold2 成功的基础上,我们进行了以下修改:改变侧链扭转角和框架对齐点误差的损失,添加侧链置信度和二级结构预测的损失函数,并使用基于条件随机场的新对齐方法替代模板特征生成。我们还通过使用基于分布和侧链预测的势函数的分子力学能量函数进行构象空间退火的重新优化。在 CASP15 对单个蛋白质和域建模的盲测(109 个域)中,DeepFold 在 132 个组中排名第四,在骨架、侧链和 Molprobity 方面的结构细节方面有所改进。在蛋白质骨架准确性方面,DeepFold 的中位数 GDT-TS 得分达到 88.64,而 AlphaFold2 为 85.88。对于 TBM-easy/hard 目标,基于 GDT-TS 的 Z 分数,DeepFold 排名第一。这表明它对结构生物学社区具有实际价值,结构生物学社区需要高度准确的结构。此外,对来自 39 个具有公开结构目标的 55 个域的彻底分析表明,DeepFold 在表现最佳的组中显示出卓越的侧链准确性和 Molprobity 分数。
DeepFold 工具是可在 https://github.com/newtonjoo/deepfold 上获得的开源软件。