Suppr超能文献

小蛋白的马尔可夫状态模型中的不确定性。

Uncertainties in Markov State Models of Small Proteins.

机构信息

Department of Theoretical and Computational Biophysics, Max-Planck-Institute for Multidisciplinary Sciences, Göttingen 37077, Germany.

出版信息

J Chem Theory Comput. 2023 Aug 22;19(16):5516-5524. doi: 10.1021/acs.jctc.3c00372. Epub 2023 Aug 4.

Abstract

Markov state models are widely used to describe and analyze protein dynamics based on molecular dynamics simulations, specifically to extract functionally relevant characteristic time scales and motions. Particularly for larger biomolecules such as proteins, however, insufficient sampling is a notorious concern and often the source of large uncertainties that are difficult to quantify. Furthermore, there are several other sources of uncertainty, such as choice of the number of Markov states and lag time, choice and parameters of dimension reduction preprocessing step, and uncertainty due to the limited number of observed transitions; the latter is often estimated via a Bayesian approach. Here, we quantified and ranked all of these uncertainties for four small globular test proteins. We found that the largest uncertainty is due to insufficient sampling and initially increases with the total trajectory length up to a critical tipping point, after which it decreases as , thus providing guidelines for how much sampling is required for given accuracy. We also found that single long trajectories yielded better sampling accuracy than many shorter trajectories starting from the same structure. In comparison, the remaining sources of the above uncertainties are generally smaller by a factor of about 5, rendering them less of a concern but certainly not negligible. Importantly, the Bayes uncertainty, commonly used as the only uncertainty estimate, captures only a relatively small part of the true uncertainty, which is thus often drastically underestimated.

摘要

马科夫状态模型被广泛用于描述和分析基于分子动力学模拟的蛋白质动力学,特别是提取具有功能相关性的特征时间尺度和运动。然而,对于较大的生物分子(如蛋白质)来说,采样不足是一个众所周知的问题,也是难以量化的不确定性的主要来源。此外,还有其他几个不确定性来源,例如马科夫状态数和滞后时间的选择、降维预处理步骤的选择和参数以及由于观察到的转变数量有限而导致的不确定性;后者通常通过贝叶斯方法进行估计。在这里,我们对四个小球状测试蛋白进行了所有这些不确定性的量化和排序。我们发现,最大的不确定性是由于采样不足引起的,并且最初随着总轨迹长度的增加而增加,直到达到一个临界转折点,之后随着轨迹长度的增加而减少,从而为给定精度需要多少采样提供了指导。我们还发现,与从相同结构开始的许多较短轨迹相比,单个长轨迹可以获得更好的采样准确性。相比之下,上述其他不确定性来源通常要小一个数量级左右,因此不太值得关注,但肯定不能忽视。重要的是,贝叶斯不确定性通常被用作唯一的不确定性估计,它仅捕获了真实不确定性的相对较小一部分,因此经常被严重低估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/70b5/10448719/f88c249ff4ca/ct3c00372_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验