Suppr超能文献

预测蛋白质中配体结合位点的多种构象表明,AlphaFold2 可能记得太多了。

Predicting multiple conformations of ligand binding sites in proteins suggests that AlphaFold2 may remember too much.

机构信息

Department of Biomedical Engineering, Boston University, Boston, MA 02215.

Department of Chemistry, Boston University, Boston, MA 02215.

出版信息

Proc Natl Acad Sci U S A. 2024 Nov 26;121(48):e2412719121. doi: 10.1073/pnas.2412719121. Epub 2024 Nov 20.

Abstract

The goal of this paper is predicting the conformational distributions of ligand binding sites using the AlphaFold2 (AF2) protein structure prediction program with stochastic subsampling of the multiple sequence alignment (MSA). We explored the opening of cryptic ligand binding sites in 16 proteins, where the closed and open conformations define the expected extreme points of the conformational variation. Due to the many structures of these proteins in the Protein Data Bank (PDB), we were able to study whether the distribution of X-ray structures affects the distribution of AF2 models. We have found that AF2 generates both a cluster of open and a cluster of closed models for proteins that have comparable numbers of open and closed structures in the PDB and not too many other conformations. This was observed even with default MSA parameters, thus without further subsampling. In contrast, with the exception of a single protein, AF2 did not yield multiple clusters of conformations for proteins that had imbalanced numbers of open and closed structures in the PDB, or had substantial numbers of other structures. Subsampling improved the results only for a single protein, but very shallow MSA led to incorrect structures. The ability of generating both open and closed conformations for six out of the 16 proteins agrees with the success rates of similar studies reported in the literature. However, we showed that this partial success is due to AF2 "remembering" the conformational distributions in the PDB and that the approach fails to predict rarely seen conformations.

摘要

本文旨在使用 AlphaFold2(AF2)蛋白质结构预测程序,通过对多重序列比对(MSA)进行随机抽样,预测配体结合位点的构象分布。我们探索了 16 种蛋白质中隐蔽配体结合位点的开启,其中封闭和开放构象定义了构象变化的预期极值。由于这些蛋白质在蛋白质数据库(PDB)中有许多结构,我们能够研究 X 射线结构的分布是否影响 AF2 模型的分布。我们发现,对于在 PDB 中具有可比数量的开放和封闭结构且没有太多其他构象的蛋白质,AF2 为其生成了开放模型簇和封闭模型簇。即使使用默认的 MSA 参数,也可以观察到这种情况,因此无需进一步抽样。相比之下,除了一种蛋白质外,对于在 PDB 中具有开放和封闭结构数量不平衡或具有大量其他结构的蛋白质,AF2 并未产生多个构象簇。抽样仅改善了一种蛋白质的结果,但非常浅的 MSA 导致了不正确的结构。对于 16 种蛋白质中的 6 种蛋白质,生成开放和封闭构象的能力与文献中报道的类似研究的成功率一致。然而,我们表明,这种部分成功是由于 AF2“记住”了 PDB 中的构象分布,并且该方法无法预测罕见出现的构象。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/514f/11621821/56dc361a0a74/pnas.2412719121fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验