Suppr超能文献

通过对有偏数据进行迭代改进用于有偏采样的数据驱动集体变量。

Improved Data-Driven Collective Variables for Biased Sampling through Iteration on Biased Data.

作者信息

Sasmal Subarna, McCullagh Martin, Hocky Glen M

机构信息

Department of Chemistry and Simons Center for Computational Physical Chemistry, New York University, New York, New York 10003, United States.

Department of Chemistry, Oklahoma State University, Stillwater, Oklahoma 74078, United States.

出版信息

J Phys Chem B. 2025 Jun 26;129(25):6163-6171. doi: 10.1021/acs.jpcb.5c02164. Epub 2025 Jun 12.

Abstract

Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV)-based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed shapeGMM. ShapeGMM is a Gaussian mixture model in Cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that linear discriminant analysis on positions (posLDA) produces good reaction coordinates to characterize the transition between two of these states, and moreover, they can be biased to produce transitions between the states using metadynamics-like approaches. However, the quality of these posLDA coordinates depends on the amount of data used to characterize the states, and here, we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data from the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states and to converge a free energy surface.

摘要

我们利用基于集体变量(CV)的采样方法高效地对生物分子两个已知状态之间的构象转变进行采样的能力,在很大程度上取决于CV的选择。我们之前报道了一种数据驱动的方法,使用一种名为shapeGMM的概率聚类模型对生物分子构型进行聚类。ShapeGMM是笛卡尔坐标下的高斯混合模型,每个簇中的均值和协方差表示围绕亚稳态的构象系综的谐波近似。我们随后表明,对位置进行线性判别分析(posLDA)能产生良好的反应坐标来表征其中两个状态之间的转变,而且,使用类似元动力学的方法可以使它们产生偏向,以促成状态之间的转变。然而,这些posLDA坐标的质量取决于用于表征状态的数据量,在此,我们展示了使用增强采样数据系统地改进它们的能力。具体而言,我们证明了通过沿着posLDA坐标迭代地进行有偏采样,然后根据前一次迭代的有偏数据生成一个新的shapeGMM模型,可以生成用于采样的改进CV。从我们的迭代方法导出的新坐标在诱导亚稳态之间的转变以及收敛自由能面方面有显著改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a2c/12207592/0bffb0affab7/jp5c02164_0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验