Suppr超能文献

信息论方法识别出蛋白质构象空间的精确低分辨率表示。

Information-theoretical measures identify accurate low-resolution representations of protein configurational space.

作者信息

Mele Margherita, Covino Roberto, Potestio Raffaello

机构信息

Physics Department, University of Trento, via Sommarive, 14 I-38123 Trento, Italy.

Frankfurt Institute for Advanced Studies, 60438 Frankfurt am Main, Germany.

出版信息

Soft Matter. 2022 Sep 28;18(37):7064-7074. doi: 10.1039/d2sm00636g.

Abstract

The steadily growing computational power employed to perform molecular dynamics simulations of biological macromolecules represents at the same time an immense opportunity and a formidable challenge. In fact, large amounts of data are produced, from which useful, synthetic, and intelligible information has to be extracted to make the crucial step from knowing to understanding. Here we tackled the problem of coarsening the conformational space sampled by proteins in the course of molecular dynamics simulations. We applied different schemes to cluster the frames of a dataset of protein simulations; we then employed an information-theoretical framework, based on the notion of and , to gauge how well the various clustering methods accomplish this simplification of the configurational space. Our approach allowed us to identify the level of resolution that optimally balances simplicity and informativeness; furthermore, we found that the most physically accurate clustering procedures are those that induce an ultrametric structure of the low-resolution space, consistently with the hypothesis that the protein conformational landscape has a self-similar organisation. The proposed strategy is general and its applicability extends beyond that of computational biophysics, making it a valuable tool to extract useful information from large datasets.

摘要

用于对生物大分子进行分子动力学模拟的计算能力不断增长,这同时代表着巨大的机遇和严峻的挑战。事实上,会产生大量数据,必须从中提取有用的、综合的和易懂的信息,才能迈出从知晓到理解的关键一步。在此,我们解决了在分子动力学模拟过程中对蛋白质采样的构象空间进行粗粒化的问题。我们应用不同的方案对蛋白质模拟数据集的各个帧进行聚类;然后,我们采用基于熵和互信息概念的信息理论框架,来评估各种聚类方法在简化构型空间方面的效果。我们的方法使我们能够确定能在简单性和信息性之间实现最佳平衡的分辨率水平;此外,我们发现,最符合物理实际的聚类过程是那些能在低分辨率空间中诱导出超度量结构的过程,这与蛋白质构象景观具有自相似组织的假设一致。所提出的策略具有通用性,其适用性超出了计算生物物理学的范畴,使其成为从大型数据集中提取有用信息的宝贵工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验