Suppr超能文献

AlphaFold2的训练集为其对折叠转换构象的预测提供了支持。

AlphaFold2's training set powers its predictions of fold-switched conformations.

作者信息

Schafer Joseph W, Porter Lauren L

机构信息

National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.

National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD, 20892, USA.

出版信息

bioRxiv. 2024 Oct 15:2024.10.11.617857. doi: 10.1101/2024.10.11.617857.

Abstract

AlphaFold2 (AF2), a deep-learning based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple protein conformations. In some cases, AF2 has successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel their secondary and tertiary structures in response to cellular stimuli. Whether AF2 has learned enough protein folding principles to reliably predict alternative conformations outside of its training set is unclear. Here, we address this question by assessing whether CFold-an implementation of the AF2 network trained on a more limited subset of experimentally determined protein structures- predicts alternative conformations of eight fold switchers from six protein families. Previous work suggests that AF2 predicted these alternative conformations by memorizing them during training. Unlike AF2, CFold's training set contains only one of these alternative conformations. Despite sampling 1300-4400 structures/protein with various sequence sampling techniques, CFold predicted only one alternative structure outside of its training set accurately and with high confidence while also generating experimentally inconsistent structures with higher confidence. Though these results indicate that AF2's current success in predicting alternative conformations of fold switchers stems largely from its training data, results from a sequence pruning technique suggest developments that could lead to a more reliable generative model in the future.

摘要

AlphaFold2(AF2)是一种基于深度学习的模型,可根据氨基酸序列预测蛋白质结构,最近被用于预测多种蛋白质构象。在某些情况下,AF2成功预测了折叠转换蛋白的主要构象和替代构象,这些蛋白会根据细胞刺激重塑其二级和三级结构。目前尚不清楚AF2是否已经学习了足够的蛋白质折叠原理,从而能够可靠地预测其训练集之外的替代构象。在此,我们通过评估CFold(一种在更有限的实验确定的蛋白质结构子集上训练的AF2网络实现)是否能预测六个蛋白质家族中八个折叠转换蛋白的替代构象来解决这个问题。先前的研究表明,AF2通过在训练过程中记忆这些替代构象来进行预测。与AF2不同,CFold的训练集中只包含这些替代构象中的一种。尽管使用各种序列采样技术对每个蛋白质采样了1300 - 4400个结构,但CFold仅准确且高置信度地预测了其训练集之外的一种替代结构,同时还以更高的置信度生成了与实验结果不一致的结构。尽管这些结果表明AF2目前在预测折叠转换蛋白替代构象方面的成功很大程度上源于其训练数据,但一种序列修剪技术的结果表明未来可能会开发出更可靠生成模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31cb/11722258/91f9be72f83b/nihpp-2024.10.11.617857v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验