Department of Biomedical Engineering, Boston University, Boston, MA 02215.
Department of Chemistry, Boston University, Boston, MA 02215.
Proc Natl Acad Sci U S A. 2024 Nov 26;121(48):e2412719121. doi: 10.1073/pnas.2412719121. Epub 2024 Nov 20.
The goal of this paper is predicting the conformational distributions of ligand binding sites using the AlphaFold2 (AF2) protein structure prediction program with stochastic subsampling of the multiple sequence alignment (MSA). We explored the opening of cryptic ligand binding sites in 16 proteins, where the closed and open conformations define the expected extreme points of the conformational variation. Due to the many structures of these proteins in the Protein Data Bank (PDB), we were able to study whether the distribution of X-ray structures affects the distribution of AF2 models. We have found that AF2 generates both a cluster of open and a cluster of closed models for proteins that have comparable numbers of open and closed structures in the PDB and not too many other conformations. This was observed even with default MSA parameters, thus without further subsampling. In contrast, with the exception of a single protein, AF2 did not yield multiple clusters of conformations for proteins that had imbalanced numbers of open and closed structures in the PDB, or had substantial numbers of other structures. Subsampling improved the results only for a single protein, but very shallow MSA led to incorrect structures. The ability of generating both open and closed conformations for six out of the 16 proteins agrees with the success rates of similar studies reported in the literature. However, we showed that this partial success is due to AF2 "remembering" the conformational distributions in the PDB and that the approach fails to predict rarely seen conformations.
本文旨在使用 AlphaFold2(AF2)蛋白质结构预测程序,通过对多重序列比对(MSA)进行随机抽样,预测配体结合位点的构象分布。我们探索了 16 种蛋白质中隐蔽配体结合位点的开启,其中封闭和开放构象定义了构象变化的预期极值。由于这些蛋白质在蛋白质数据库(PDB)中有许多结构,我们能够研究 X 射线结构的分布是否影响 AF2 模型的分布。我们发现,对于在 PDB 中具有可比数量的开放和封闭结构且没有太多其他构象的蛋白质,AF2 为其生成了开放模型簇和封闭模型簇。即使使用默认的 MSA 参数,也可以观察到这种情况,因此无需进一步抽样。相比之下,除了一种蛋白质外,对于在 PDB 中具有开放和封闭结构数量不平衡或具有大量其他结构的蛋白质,AF2 并未产生多个构象簇。抽样仅改善了一种蛋白质的结果,但非常浅的 MSA 导致了不正确的结构。对于 16 种蛋白质中的 6 种蛋白质,生成开放和封闭构象的能力与文献中报道的类似研究的成功率一致。然而,我们表明,这种部分成功是由于 AF2“记住”了 PDB 中的构象分布,并且该方法无法预测罕见出现的构象。