Suppr超能文献

利用机器学习对蛋白激酶构象进行分类。

Classifying protein kinase conformations with machine learning.

机构信息

Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France.

出版信息

Protein Sci. 2024 Apr;33(4):e4918. doi: 10.1002/pro.4918.

Abstract

Protein kinases are key actors of signaling networks and important drug targets. They cycle between active and inactive conformations, distinguished by a few elements within the catalytic domain. One is the activation loop, whose conserved DFG motif can occupy DFG-in, DFG-out, and some rarer conformations. Annotation and classification of the structural kinome are important, as different conformations can be targeted by different inhibitors and activators. Valuable resources exist; however, large-scale applications will benefit from increased automation and interpretability of structural annotation. Interpretable machine learning models are described for this purpose, based on ensembles of decision trees. To train them, a set of catalytic domain sequences and structures was collected, somewhat larger and more diverse than existing resources. The structures were clustered based on the DFG conformation and manually annotated. They were then used as training input. Two main models were constructed, which distinguished active/inactive and in/out/other DFG conformations. They considered initially 1692 structural variables, spanning the whole catalytic domain, then identified ("learned") a small subset that sufficed for accurate classification. The first model correctly labeled all but 3 of 3289 structures as active or inactive, while the second assigned the correct DFG label to all but 17 of 8826 structures. The most potent classifying variables were all related to well-known structural elements in or near the activation loop and their ranking gives insights into the conformational preferences. The models were used to automatically annotate 3850 kinase structures predicted recently with the Alphafold2 tool, showing that Alphafold2 reproduced the active/inactive but not the DFG-in proportions seen in the Protein Data Bank. We expect the models will be useful for understanding and engineering kinases.

摘要

蛋白质激酶是信号转导网络的关键因子,也是重要的药物靶点。它们在活性和非活性构象之间循环,这两种构象由催化结构域中的几个元素区分开来。其中一个是激活环,其保守的 DFG 基序可以占据 DFG-in、DFG-out 和一些罕见的构象。对结构激酶组进行注释和分类非常重要,因为不同的构象可以被不同的抑制剂和激活剂靶向。虽然已经存在有价值的资源,但大规模应用将受益于结构注释的自动化和可解释性的提高。为此,描述了基于决策树集成的可解释机器学习模型。为了训练它们,收集了一组催化结构域序列和结构,其规模比现有资源更大,种类也更多。这些结构基于 DFG 构象进行聚类,并进行手动注释。然后将它们用作训练输入。构建了两个主要模型,用于区分活性/非活性和 DFG-in/out/其他构象。它们最初考虑了 1692 个结构变量,这些变量跨越整个催化结构域,然后确定(“学习”)了一个足以进行准确分类的小子集。第一个模型正确标记了 3289 个结构中的除 3 个之外的所有结构为活性或非活性,而第二个模型正确标记了 8826 个结构中的除 17 个之外的所有结构为 DFG-in。最有效的分类变量都与激活环内或附近的众所周知的结构元素有关,它们的排序提供了构象偏好的见解。这些模型用于自动注释最近使用 Alphafold2 工具预测的 3850 个激酶结构,表明 Alphafold2 复制了活性/非活性,但没有复制蛋白数据库中看到的 DFG-in 比例。我们预计这些模型将有助于理解和设计激酶。

相似文献

2
3
Redefining the Protein Kinase Conformational Space with Machine Learning.用机器学习重新定义蛋白激酶构象空间。
Cell Chem Biol. 2018 Jul 19;25(7):916-924.e2. doi: 10.1016/j.chembiol.2018.05.002. Epub 2018 May 31.
4
The ABC of protein kinase conformations.蛋白激酶构象基础
Biochim Biophys Acta. 2015 Oct;1854(10 Pt B):1555-66. doi: 10.1016/j.bbapap.2015.03.009. Epub 2015 Apr 1.
9
Structure of mitogen-activated protein kinase kinase 1 in the DFG-out conformation.无丝分裂原活化蛋白激酶激酶 1 在 DFG -out 构象下的结构。
Acta Crystallogr F Struct Biol Commun. 2021 Dec 1;77(Pt 12):459-464. doi: 10.1107/S2053230X21011687. Epub 2021 Nov 25.
10
Defining a new nomenclature for the structures of active and inactive kinases.定义活性和非活性激酶结构的新命名法。
Proc Natl Acad Sci U S A. 2019 Apr 2;116(14):6818-6827. doi: 10.1073/pnas.1814279116. Epub 2019 Mar 13.

本文引用的文献

9
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验