Conev Anja, Rigo Mauricio Menagatti, Devaurs Didier, Fonseca André Faustino, Kalavadwala Hussain, de Freitas Martiela Vaz, Clementi Cecilia, Zanatta Geancarlo, Antunes Dinler Amaral, Kavraki Lydia
Department of Computer Science, Rice University, Houston, TX 77005, USA.
MRC Institute of Genetics and Cancer, University of Edinburgh, EH4 2XU, UK.
bioRxiv. 2023 Apr 28:2023.04.24.538094. doi: 10.1101/2023.04.24.538094.
Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing protein conformational ensembles. In this work we: (1) provide an overview of existing methods and tools for protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples found in the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein-ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
蛋白质是在细胞中执行重要功能的动态大分子。蛋白质结构决定其功能,但这种结构并非静态,因为蛋白质会改变其构象以实现各种功能。了解蛋白质的构象景观对于理解其作用机制至关重要。精心挑选的构象集可以总结这种复杂的景观,并比单一构象更深入地洞察蛋白质功能。我们将这些构象集称为代表性构象集合。计算方法的最新进展导致跨越构象景观的可用结构数据集数量增加。然而,从此类数据集中提取代表性构象集合并非易事,并且已经开发了许多方法来解决这一问题。我们的新方法EnGens(构象集合生成的缩写)将这些方法整合到一个统一的框架中,用于生成和分析蛋白质构象集合。在这项工作中,我们:(1)概述了用于蛋白质结构集合生成和分析的现有方法和工具;(2)将现有方法整合到一个开源Python包和一个便携式Docker镜像中,在Jupyter Notebook管道中提供交互式可视化;(3)在文献中找到的一些典型示例上测试我们的管道。EnGens生成的代表性集合可用于许多下游任务, 例如蛋白质-配体集合对接、蛋白质动力学的马尔可夫状态建模以及单点突变效应分析。