Suppr超能文献

通过机器学习解析蛋白质构象可塑性与底物结合

Resolving Protein Conformational Plasticity and Substrate Binding via Machine Learning.

作者信息

Ahalawat Navjeet, Sahil Mohammad, Mondal Jagannath

机构信息

Department of Bioinformatics and Computational Biology, College of Biotechnology, CCS Haryana Agricultural University, Hisar 125 004, Haryana, India.

Center for Interdisciplinary Sciences, Tata Institute of Fundamental Research, Hyderabad 500046, India.

出版信息

J Chem Theory Comput. 2023 May 9;19(9):2644-2657. doi: 10.1021/acs.jctc.2c00932. Epub 2023 Apr 17.

Abstract

A long-standing target in elucidating the biomolecular recognition process is the identification of binding-competent conformations of the receptor protein. However, protein conformational plasticity and the stochastic nature of the recognition processes often preclude the assignment of a specific protein conformation to an individual ligand-bound pose. Here, we demonstrate that a computational framework coined as RF-TICA-MD, which integrates an ensemble decision-tree-based Random Forest (RF) machine learning (ML) technique with an unsupervised dimension reduction approach time-structured independent component analysis (TICA), provides an efficient and unambiguous solution toward resolving protein conformational plasticity and the substrate binding process. In particular, we consider multimicrosecond-long molecular dynamics (MD) simulation trajectories of a ligand recognition process in solvent-inaccessible cavities of archetypal proteins T4 lysozyme and cytochrome P450cam. We show that in a scenario in which clear correspondence between protein conformation and binding-competent macrostates could not be obtained via an unsupervised dimension reduction approach, an decision-tree-based supervised classification of the simulated recognition trajectories via RF would help characterize key amino acid residue pairs of the protein that are deemed sensitive for ligand binding. A subsequent unsupervised dimensional reduction of the selected residue pairs via TICA would then delineate a conformational landscape of protein which is able to demarcate ligand-bound poses from unbound ones. The proposed RF-TICA-MD approach is shown to be data agnostic and found to be robust when using other ML-based classification methods such as XGBoost. As a promising spinoff of the protocol, the framework is found to be capable of identifying distal protein locations which would be allosterically important for ligand binding and would characterize their roles in recognition pathways. A Python implementation of a proposed ML workflow is available in GitHub https://github.com/navjeet0211/rf-tica-md.

摘要

阐明生物分子识别过程的一个长期目标是确定受体蛋白具有结合能力的构象。然而,蛋白质构象可塑性和识别过程的随机性常常妨碍将特定的蛋白质构象与单个配体结合姿态相关联。在此,我们证明了一种名为RF-TICA-MD的计算框架,它将基于集成决策树的随机森林(RF)机器学习(ML)技术与无监督降维方法时间结构独立成分分析(TICA)相结合,为解决蛋白质构象可塑性和底物结合过程提供了一种高效且明确的解决方案。具体而言,我们考虑了原型蛋白T4溶菌酶和细胞色素P450cam在溶剂不可及腔中配体识别过程的多微秒级分子动力学(MD)模拟轨迹。我们表明,在无法通过无监督降维方法获得蛋白质构象与具有结合能力的宏观状态之间清晰对应关系的情况下,通过RF对模拟识别轨迹进行基于决策树的监督分类将有助于表征蛋白质中被认为对配体结合敏感的关键氨基酸残基对。随后通过TICA对选定残基对进行无监督降维,将描绘出蛋白质的构象景观,该景观能够区分配体结合姿态和未结合姿态。所提出的RF-TICA-MD方法被证明与数据无关,并且在使用其他基于ML的分类方法(如XGBoost)时具有鲁棒性。作为该方案的一个有前景的衍生成果,该框架能够识别对配体结合具有变构重要性的蛋白质远端位置,并表征它们在识别途径中的作用。所提出的ML工作流程的Python实现可在GitHub上获取,链接为https://github.com/navjeet0211/rf-tica-md。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验