Physics Department, University of Trento, via Sommarive 14, I-38123 Trento, Italy.
INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, I-38123 Trento, Italy.
J Chem Theory Comput. 2020 Nov 10;16(11):6795-6813. doi: 10.1021/acs.jctc.0c00676. Epub 2020 Oct 27.
In theoretical modeling of a physical system, a crucial step consists of the identification of those degrees of freedom that enable a synthetic yet informative representation of it. While in some cases this selection can be carried out on the basis of intuition and experience, straightforward discrimination of the important features from the negligible ones is difficult for many complex systems, most notably heteropolymers and large biomolecules. We here present a thermodynamics-based theoretical framework to gauge the effectiveness of a given simplified representation by measuring its information content. We employ this method to identify those reduced descriptions of proteins, in terms of a subset of their atoms, that retain the largest amount of information from the original model; we show that these highly informative representations share common features that are intrinsically related to the biological properties of the proteins under examination, thereby establishing a bridge between protein structure, energetics, and function.
在物理系统的理论建模中,一个关键步骤包括确定那些自由度,这些自由度能够对其进行综合但有信息的表示。虽然在某些情况下,这种选择可以基于直觉和经验进行,但对于许多复杂系统,包括杂多聚物和大型生物分子,很难从微不足道的特征中直接区分重要特征。我们在这里提出了一个基于热力学的理论框架,通过测量信息含量来评估给定简化表示的有效性。我们采用这种方法来确定那些以蛋白质原子子集表示的简化描述,这些描述保留了原始模型中最大量的信息;我们表明,这些高度信息丰富的表示形式具有共同的特征,这些特征与所研究的蛋白质的生物学特性内在相关,从而在蛋白质结构、能量学和功能之间建立了桥梁。