Mele Margherita, Covino Roberto, Potestio Raffaello
Physics Department, University of Trento, via Sommarive, 14 I-38123 Trento, Italy.
Frankfurt Institute for Advanced Studies, 60438 Frankfurt am Main, Germany.
Soft Matter. 2022 Sep 28;18(37):7064-7074. doi: 10.1039/d2sm00636g.
The steadily growing computational power employed to perform molecular dynamics simulations of biological macromolecules represents at the same time an immense opportunity and a formidable challenge. In fact, large amounts of data are produced, from which useful, synthetic, and intelligible information has to be extracted to make the crucial step from knowing to understanding. Here we tackled the problem of coarsening the conformational space sampled by proteins in the course of molecular dynamics simulations. We applied different schemes to cluster the frames of a dataset of protein simulations; we then employed an information-theoretical framework, based on the notion of and , to gauge how well the various clustering methods accomplish this simplification of the configurational space. Our approach allowed us to identify the level of resolution that optimally balances simplicity and informativeness; furthermore, we found that the most physically accurate clustering procedures are those that induce an ultrametric structure of the low-resolution space, consistently with the hypothesis that the protein conformational landscape has a self-similar organisation. The proposed strategy is general and its applicability extends beyond that of computational biophysics, making it a valuable tool to extract useful information from large datasets.
用于对生物大分子进行分子动力学模拟的计算能力不断增长,这同时代表着巨大的机遇和严峻的挑战。事实上,会产生大量数据,必须从中提取有用的、综合的和易懂的信息,才能迈出从知晓到理解的关键一步。在此,我们解决了在分子动力学模拟过程中对蛋白质采样的构象空间进行粗粒化的问题。我们应用不同的方案对蛋白质模拟数据集的各个帧进行聚类;然后,我们采用基于熵和互信息概念的信息理论框架,来评估各种聚类方法在简化构型空间方面的效果。我们的方法使我们能够确定能在简单性和信息性之间实现最佳平衡的分辨率水平;此外,我们发现,最符合物理实际的聚类过程是那些能在低分辨率空间中诱导出超度量结构的过程,这与蛋白质构象景观具有自相似组织的假设一致。所提出的策略具有通用性,其适用性超出了计算生物物理学的范畴,使其成为从大型数据集中提取有用信息的宝贵工具。