信息论方法识别出蛋白质构象空间的精确低分辨率表示。

Information-theoretical measures identify accurate low-resolution representations of protein configurational space.

作者信息

Mele Margherita, Covino Roberto, Potestio Raffaello

机构信息

Physics Department, University of Trento, via Sommarive, 14 I-38123 Trento, Italy.

Frankfurt Institute for Advanced Studies, 60438 Frankfurt am Main, Germany.

出版信息

Soft Matter. 2022 Sep 28;18(37):7064-7074. doi: 10.1039/d2sm00636g.

DOI:10.1039/d2sm00636g

PMID:36070256

Abstract

The steadily growing computational power employed to perform molecular dynamics simulations of biological macromolecules represents at the same time an immense opportunity and a formidable challenge. In fact, large amounts of data are produced, from which useful, synthetic, and intelligible information has to be extracted to make the crucial step from knowing to understanding. Here we tackled the problem of coarsening the conformational space sampled by proteins in the course of molecular dynamics simulations. We applied different schemes to cluster the frames of a dataset of protein simulations; we then employed an information-theoretical framework, based on the notion of and , to gauge how well the various clustering methods accomplish this simplification of the configurational space. Our approach allowed us to identify the level of resolution that optimally balances simplicity and informativeness; furthermore, we found that the most physically accurate clustering procedures are those that induce an ultrametric structure of the low-resolution space, consistently with the hypothesis that the protein conformational landscape has a self-similar organisation. The proposed strategy is general and its applicability extends beyond that of computational biophysics, making it a valuable tool to extract useful information from large datasets.

摘要

用于对生物大分子进行分子动力学模拟的计算能力不断增长，这同时代表着巨大的机遇和严峻的挑战。事实上，会产生大量数据，必须从中提取有用的、综合的和易懂的信息，才能迈出从知晓到理解的关键一步。在此，我们解决了在分子动力学模拟过程中对蛋白质采样的构象空间进行粗粒化的问题。我们应用不同的方案对蛋白质模拟数据集的各个帧进行聚类；然后，我们采用基于熵和互信息概念的信息理论框架，来评估各种聚类方法在简化构型空间方面的效果。我们的方法使我们能够确定能在简单性和信息性之间实现最佳平衡的分辨率水平；此外，我们发现，最符合物理实际的聚类过程是那些能在低分辨率空间中诱导出超度量结构的过程，这与蛋白质构象景观具有自相似组织的假设一致。所提出的策略具有通用性，其适用性超出了计算生物物理学的范畴，使其成为从大型数据集中提取有用信息的宝贵工具。

相似文献

Information-theoretical measures identify accurate low-resolution representations of protein configurational space.信息论方法识别出蛋白质构象空间的精确低分辨率表示。

Soft Matter. 2022 Sep 28;18(37):7064-7074. doi: 10.1039/d2sm00636g.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象：化学与物理邂逅生物学（瑞士阿斯科纳，2012年6月10日至14日）

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

Protein Structure Validation Derives a Smart Conformational Search in a Physically Relevant Configurational Subspace.蛋白质结构验证在物理相关构象子空间中进行智能构象搜索。

J Chem Inf Model. 2022 Dec 12;62(23):6217-6227. doi: 10.1021/acs.jcim.2c01173. Epub 2022 Nov 30.

Dissecting Protein Configurational Entropy into Conformational and Vibrational Contributions.将蛋白质构型熵分解为构象贡献和振动贡献。

J Phys Chem B. 2015 Oct 1;119(39):12623-31. doi: 10.1021/acs.jpcb.5b07060. Epub 2015 Sep 16.

Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important.对分子模拟轨迹进行聚类分析，这些轨迹对于采样状态的构象和取向都很重要的系统。

J Comput Chem. 2016 Aug 5;37(21):1973-82. doi: 10.1002/jcc.24416. Epub 2016 Jun 12.

Validating clustering of molecular dynamics simulations using polymer models.使用聚合物模型验证分子动力学模拟的聚类。

BMC Bioinformatics. 2011 Nov 14;12:445. doi: 10.1186/1471-2105-12-445.

Unrolr: Structural analysis of protein conformations using stochastic proximity embedding.Unrolr：使用随机邻近嵌入进行蛋白质构象的结构分析。

J Comput Chem. 2018 Nov 15;39(30):2551-2557. doi: 10.1002/jcc.25599.

Effect of Clustering Algorithm on Establishing Markov State Model for Molecular Dynamics Simulations.聚类算法对建立分子动力学模拟的 Markov 状态模型的影响。

J Chem Inf Model. 2016 Jun 27;56(6):1205-15. doi: 10.1021/acs.jcim.6b00181. Epub 2016 Jun 8.

CAVER: Algorithms for Analyzing Dynamics of Tunnels in Macromolecules.CAVER：用于分析大分子中通道动力学的算法

IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):505-17. doi: 10.1109/TCBB.2015.2459680.

Computational models of protein kinematics and dynamics: beyond simulation.蛋白质运动学和动力学的计算模型：超越模拟。

Annu Rev Anal Chem (Palo Alto Calif). 2012;5:273-91. doi: 10.1146/annurev-anchem-062011-143024. Epub 2012 Apr 9.

引用本文的文献

Quality assessment and community detection methods for anonymized mobility data in the Italian Covid context.意大利新冠疫情下匿名移动数据的质量评估和社区检测方法。

Sci Rep. 2024 Feb 26;14(1):4636. doi: 10.1038/s41598-024-54878-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

信息论方法识别出蛋白质构象空间的精确低分辨率表示。

Information-theoretical measures identify accurate low-resolution representations of protein configurational space.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献