通过机器学习解析蛋白质构象可塑性与底物结合

Resolving Protein Conformational Plasticity and Substrate Binding via Machine Learning.

作者信息

Ahalawat Navjeet, Sahil Mohammad, Mondal Jagannath

机构信息

Department of Bioinformatics and Computational Biology, College of Biotechnology, CCS Haryana Agricultural University, Hisar 125 004, Haryana, India.

Center for Interdisciplinary Sciences, Tata Institute of Fundamental Research, Hyderabad 500046, India.

出版信息

J Chem Theory Comput. 2023 May 9;19(9):2644-2657. doi: 10.1021/acs.jctc.2c00932. Epub 2023 Apr 17.

DOI:10.1021/acs.jctc.2c00932

PMID:37068044

Abstract

A long-standing target in elucidating the biomolecular recognition process is the identification of binding-competent conformations of the receptor protein. However, protein conformational plasticity and the stochastic nature of the recognition processes often preclude the assignment of a specific protein conformation to an individual ligand-bound pose. Here, we demonstrate that a computational framework coined as RF-TICA-MD, which integrates an ensemble decision-tree-based Random Forest (RF) machine learning (ML) technique with an unsupervised dimension reduction approach time-structured independent component analysis (TICA), provides an efficient and unambiguous solution toward resolving protein conformational plasticity and the substrate binding process. In particular, we consider multimicrosecond-long molecular dynamics (MD) simulation trajectories of a ligand recognition process in solvent-inaccessible cavities of archetypal proteins T4 lysozyme and cytochrome P450cam. We show that in a scenario in which clear correspondence between protein conformation and binding-competent macrostates could not be obtained via an unsupervised dimension reduction approach, an decision-tree-based supervised classification of the simulated recognition trajectories via RF would help characterize key amino acid residue pairs of the protein that are deemed sensitive for ligand binding. A subsequent unsupervised dimensional reduction of the selected residue pairs via TICA would then delineate a conformational landscape of protein which is able to demarcate ligand-bound poses from unbound ones. The proposed RF-TICA-MD approach is shown to be data agnostic and found to be robust when using other ML-based classification methods such as XGBoost. As a promising spinoff of the protocol, the framework is found to be capable of identifying distal protein locations which would be allosterically important for ligand binding and would characterize their roles in recognition pathways. A Python implementation of a proposed ML workflow is available in GitHub https://github.com/navjeet0211/rf-tica-md.

摘要

阐明生物分子识别过程的一个长期目标是确定受体蛋白具有结合能力的构象。然而，蛋白质构象可塑性和识别过程的随机性常常妨碍将特定的蛋白质构象与单个配体结合姿态相关联。在此，我们证明了一种名为RF-TICA-MD的计算框架，它将基于集成决策树的随机森林（RF）机器学习（ML）技术与无监督降维方法时间结构独立成分分析（TICA）相结合，为解决蛋白质构象可塑性和底物结合过程提供了一种高效且明确的解决方案。具体而言，我们考虑了原型蛋白T4溶菌酶和细胞色素P450cam在溶剂不可及腔中配体识别过程的多微秒级分子动力学（MD）模拟轨迹。我们表明，在无法通过无监督降维方法获得蛋白质构象与具有结合能力的宏观状态之间清晰对应关系的情况下，通过RF对模拟识别轨迹进行基于决策树的监督分类将有助于表征蛋白质中被认为对配体结合敏感的关键氨基酸残基对。随后通过TICA对选定残基对进行无监督降维，将描绘出蛋白质的构象景观，该景观能够区分配体结合姿态和未结合姿态。所提出的RF-TICA-MD方法被证明与数据无关，并且在使用其他基于ML的分类方法（如XGBoost）时具有鲁棒性。作为该方案的一个有前景的衍生成果，该框架能够识别对配体结合具有变构重要性的蛋白质远端位置，并表征它们在识别途径中的作用。所提出的ML工作流程的Python实现可在GitHub上获取，链接为https://github.com/navjeet0211/rf-tica-md。

相似文献

Resolving Protein Conformational Plasticity and Substrate Binding via Machine Learning.通过机器学习解析蛋白质构象可塑性与底物结合

J Chem Theory Comput. 2023 May 9;19(9):2644-2657. doi: 10.1021/acs.jctc.2c00932. Epub 2023 Apr 17.

MDFit: automated molecular simulations workflow enables high throughput assessment of ligands-protein dynamics.MDFit：自动化分子模拟工作流程可实现配体-蛋白质动力学的高通量评估。

J Comput Aided Mol Des. 2024 Jul 17;38(1):24. doi: 10.1007/s10822-024-00564-2.

Mapping the Substrate Recognition Pathway in Cytochrome P450.绘制细胞色素 P450 中底物识别途径的图谱。

J Am Chem Soc. 2018 Dec 19;140(50):17743-17752. doi: 10.1021/jacs.8b10840. Epub 2018 Dec 10.

Machine learning accelerates MD-based binding pose prediction between ligands and proteins.机器学习加速了基于 MD 的配体与蛋白质之间结合构象预测。

Bioinformatics. 2018 Mar 1;34(5):770-778. doi: 10.1093/bioinformatics/btx638.

Time-Lagged Independent Component Analysis of Random Walks and Protein Dynamics.随机漫步和蛋白质动力学的时滞独立成分分析。

J Chem Theory Comput. 2021 Sep 14;17(9):5766-5776. doi: 10.1021/acs.jctc.1c00273. Epub 2021 Aug 27.

Machine learning in computational docking.计算对接中的机器学习。

Artif Intell Med. 2015 Mar;63(3):135-52. doi: 10.1016/j.artmed.2015.02.002. Epub 2015 Feb 16.

Reconciling conformational heterogeneity and substrate recognition in cytochrome P450.在细胞色素 P450 中协调构象异质性和底物识别。

Biophys J. 2021 May 4;120(9):1732-1745. doi: 10.1016/j.bpj.2021.02.040. Epub 2021 Mar 4.

Modeling Binding with Large Conformational Changes: Key Points in Ensemble-Docking Approaches.建模具有大构象变化的结合：集合对接方法中的关键点。

J Chem Inf Model. 2017 Jul 24;57(7):1563-1578. doi: 10.1021/acs.jcim.7b00125. Epub 2017 Jun 30.

A deep encoder-decoder framework for identifying distinct ligand binding pathways.一种用于识别独特配体结合途径的深度编解码器框架。

J Chem Phys. 2023 May 21;158(19). doi: 10.1063/5.0145197.

Random Forest Refinement of Pairwise Potentials for Protein-Ligand Decoy Detection.随机森林算法对蛋白质-配体虚拟筛选中对作用能的改进。

J Chem Inf Model. 2019 Jul 22;59(7):3305-3315. doi: 10.1021/acs.jcim.9b00356. Epub 2019 Jul 2.

引用本文的文献

PathInHydro, a Set of Machine Learning Models to Identify Unbinding Pathways of Gas Molecules in [NiFe] Hydrogenases.PathInHydro，一组用于识别[NiFe]氢化酶中气体分子解离途径的机器学习模型。

J Chem Inf Model. 2025 Jan 27;65(2):589-602. doi: 10.1021/acs.jcim.4c01656. Epub 2025 Jan 7.

Molecular dynamics and machine learning stratify motion-dependent activity profiles of S-layer destabilizing nanobodies.分子动力学和机器学习对S层去稳定纳米抗体的运动依赖性活性谱进行分层。

PNAS Nexus. 2024 Nov 26;3(12):pgae538. doi: 10.1093/pnasnexus/pgae538. eCollection 2024 Dec.

Beyond sequence: Structure-based machine learning.超越序列：基于结构的机器学习。

Comput Struct Biotechnol J. 2022 Dec 29;21:630-643. doi: 10.1016/j.csbj.2022.12.039. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过机器学习解析蛋白质构象可塑性与底物结合

Resolving Protein Conformational Plasticity and Substrate Binding via Machine Learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献