Vögele Martin, Thomson Neil J, Truong Sang T, McAvity Jasper, Zachariae Ulrich, Dror Ron O
Department of Computer Science, Stanford University, Stanford, California 94305, USA.
Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, California 94305, USA.
J Chem Phys. 2025 Jan 7;162(1). doi: 10.1063/5.0235544.
Atomic-level simulations are widely used to study biomolecules and their dynamics. A common goal in such studies is to compare simulations of a molecular system under several conditions-for example, with various mutations or bound ligands-in order to identify differences between the molecular conformations adopted under these conditions. However, the large amount of data produced by simulations of ever larger and more complex systems often renders it difficult to identify the structural features that are relevant to a particular biochemical phenomenon. We present a flexible software package named Python ENSemble Analysis (PENSA) that enables a comprehensive and thorough investigation into biomolecular conformational ensembles. It provides featurization and feature transformations that allow for a complete representation of biomolecules such as proteins and nucleic acids, including water and ion binding sites, thus avoiding the bias that would come with manual feature selection. PENSA implements methods to systematically compare the distributions of molecular features across ensembles to find the significant differences between them and identify regions of interest. It also includes a novel approach to quantify the state-specific information between two regions of a biomolecule, which allows, for example, tracing information flow to identify allosteric pathways. PENSA also comes with convenient tools for loading data and visualizing results, making them quick to process and easy to interpret. PENSA is an open-source Python library maintained at https://github.com/drorlab/pensa along with an example workflow and a tutorial. We demonstrate its usefulness in real-world examples by showing how it helps us determine molecular mechanisms efficiently.
原子水平的模拟被广泛用于研究生物分子及其动力学。此类研究的一个常见目标是比较分子系统在几种条件下的模拟结果——例如,具有各种突变或结合配体的情况——以便识别在这些条件下所采用的分子构象之间的差异。然而,由越来越大且越来越复杂的系统的模拟产生的大量数据常常使得难以识别与特定生化现象相关的结构特征。我们提出了一个名为Python集成分析(PENSA)的灵活软件包,它能够对生物分子构象集合进行全面而深入的研究。它提供了特征化和特征转换,能够完整地表示蛋白质和核酸等生物分子,包括水和离子结合位点,从而避免了手动特征选择可能带来的偏差。PENSA实现了系统比较集合中分子特征分布的方法,以找到它们之间的显著差异并识别感兴趣的区域。它还包括一种新颖的方法来量化生物分子两个区域之间的状态特异性信息,例如,这允许追踪信息流以识别变构途径。PENSA还附带了用于加载数据和可视化结果的便捷工具,使其处理快速且易于解释。PENSA是一个开源的Python库,可在https://github.com/drorlab/pensa上获取,同时还有一个示例工作流程和教程。我们通过展示它如何帮助我们高效地确定分子机制,在实际例子中证明了它的实用性。