Ille Alexander M, Anas Emily, Mathews Michael B, Burley Stephen K
Rutgers Cancer Institute, Rutgers, The State University of New Jersey, Newark, New Jersey 07103, USA.
College of Computing, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
Struct Dyn. 2025 Jun 24;12(3):030902. doi: 10.1063/4.0000765. eCollection 2025 May.
The 2024 Nobel Prize in Chemistry was awarded in part for protein structure prediction using AlphaFold2, an artificial intelligence/machine learning (AI/ML) model trained on vast amounts of sequence and three-dimensional structure data. AlphaFold2 and related models, including RoseTTAFold and ESMFold, employ specialized neural network architectures driven by attention mechanisms to infer relationships between sequence and structure. At a fundamental level, these AI/ML models operate on the long-standing hypothesis that the structure of a protein is determined by its amino acid sequence. More recently, AlphaFold2 has been adapted for the prediction of multiple protein conformations by subsampling multiple sequence alignments. Herein, we provide an overview of the deterministic relationship between sequence and structure, which was hypothesized over half a century ago with profound implications for the biological sciences ever since. We postulate that protein conformational dynamics are also determined, at least in part, by amino acid sequence and that this relationship may be leveraged for construction of AI/ML models dedicated to predicting protein conformational ensembles. Accordingly, we describe a conceptual model architecture, which may be trained on sequence data in combination with conformationally sensitive structural information, coming primarily from nuclear magnetic resonance (NMR) spectroscopy. Notwithstanding certain limitations in this context, NMR offers abundant structural heterogeneity conducive to conformational ensemble prediction. As NMR and other data continue to accumulate, sequence-informed prediction of protein structural dynamics with AI/ML has the potential to emerge as a transformative capability across the biological sciences.
2024年诺贝尔化学奖部分授予了使用AlphaFold2进行蛋白质结构预测,AlphaFold2是一种基于大量序列和三维结构数据训练的人工智能/机器学习(AI/ML)模型。AlphaFold2及相关模型,包括RoseTTAFold和ESMFold,采用由注意力机制驱动的专门神经网络架构来推断序列与结构之间的关系。从根本层面上讲,这些AI/ML模型基于一个长期存在的假设运行,即蛋白质的结构由其氨基酸序列决定。最近,AlphaFold2已通过对多个序列比对进行子采样,适用于预测多种蛋白质构象。在此,我们概述了序列与结构之间的确定性关系,这一关系在半个多世纪前就已被提出,自那时起对生物科学产生了深远影响。我们推测蛋白质构象动力学至少部分也由氨基酸序列决定,并且这种关系可用于构建致力于预测蛋白质构象集合的AI/ML模型。因此,我们描述了一种概念模型架构,该架构可结合主要来自核磁共振(NMR)光谱的构象敏感结构信息,在序列数据上进行训练。尽管在这种情况下存在某些局限性,但NMR提供了丰富的结构异质性,有利于构象集合预测。随着NMR和其他数据不断积累,利用AI/ML对蛋白质结构动力学进行序列信息预测有可能成为贯穿生物科学的一种变革性能力。