Gut Jannik Adrian, Lemmin Thomas
Institute of Biochemistry and Molecular Medicine, University of Bern, Bern 3012, Switzerland.
Graduate School for Cellular and Biomedical Sciences (GCB), University of Bern, Bern 3012, Switzerland.
Bioinform Adv. 2024 Nov 25;5(1):vbae187. doi: 10.1093/bioadv/vbae187. eCollection 2025.
Protein structure prediction aims to infer a protein's three-dimensional (3D) structure from its amino acid sequence. Protein structure is pivotal for elucidating protein functions, interactions, and driving biotechnological innovation. The deep learning model AlphaFold2, has revolutionized this field by leveraging phylogenetic information from multiple sequence alignments (MSAs) to achieve remarkable accuracy in protein structure prediction. However, a key question remains: how well does AlphaFold2 understand protein structures? This study investigates AlphaFold2's capabilities when relying primarily on high-quality template structures, without the additional information provided by MSAs. By designing experiments that probe local and global structural understanding, we aimed to dissect its dependence on specific features and its ability to handle missing information. Our findings revealed AlphaFold2's reliance on sterically valid C for correctly interpreting structural templates. Additionally, we observed its remarkable ability to recover 3D structures from certain perturbations and the negligible impact of the previous structure in recycling. Collectively, these results support the hypothesis that AlphaFold2 has learned an accurate biophysical energy function. However, this function seems most effective for local interactions. Our work advances understanding of how deep learning models predict protein structures and provides guidance for researchers aiming to overcome limitations in these models.
Data and implementation are available at https://github.com/ibmm-unibe-ch/template-analysis.
蛋白质结构预测旨在从氨基酸序列推断蛋白质的三维(3D)结构。蛋白质结构对于阐明蛋白质功能、相互作用以及推动生物技术创新至关重要。深度学习模型AlphaFold2通过利用多序列比对(MSA)中的系统发育信息,在蛋白质结构预测方面取得了显著的准确性,从而彻底改变了这一领域。然而,一个关键问题仍然存在:AlphaFold2对蛋白质结构的理解程度如何?本研究调查了主要依赖高质量模板结构时AlphaFold2的能力,而不依赖MSA提供的额外信息。通过设计探测局部和全局结构理解的实验,我们旨在剖析其对特定特征的依赖性以及处理缺失信息的能力。我们的研究结果揭示了AlphaFold2在正确解释结构模板时对空间有效C的依赖。此外,我们观察到它从某些扰动中恢复3D结构的显著能力以及先前结构在循环利用中的可忽略不计的影响。总体而言,这些结果支持了AlphaFold2已经学习到准确的生物物理能量函数这一假设。然而,该函数似乎对局部相互作用最为有效。我们的工作推进了对深度学习模型如何预测蛋白质结构的理解,并为旨在克服这些模型局限性的研究人员提供了指导。
数据和实现方式可在https://github.com/ibmm-unibe-ch/template-analysis获取。