Suppr超能文献

通过物理启发的测地线插值利用合成数据增强学习集体变量

Learning Collective Variables with Synthetic Data Augmentation through Physics-Inspired Geodesic Interpolation.

作者信息

Yang Soojung, Nam Juno, Dietschreit Johannes C B, Gómez-Bombarelli Rafael

机构信息

Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

出版信息

J Chem Theory Comput. 2024 Aug 13;20(15):6559-6568. doi: 10.1021/acs.jctc.4c00435. Epub 2024 Jul 29.

Abstract

In molecular dynamics simulations, rare events, such as protein folding, are typically studied using enhanced sampling techniques, most of which are based on the definition of a collective variable (CV) along which acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. This new data can be used to improve the accuracy of classifier-based methods. Alternatively, a regression-based learning scheme for CV models can be adopted by leveraging the interpolation progress parameter.

摘要

在分子动力学模拟中,诸如蛋白质折叠等罕见事件通常使用增强采样技术进行研究,其中大多数技术基于集体变量(CV)的定义,沿着该变量会发生加速。获得一个有表现力的CV至关重要,但往往会因缺乏有关特定事件的信息而受阻,例如从未折叠构象到折叠构象的转变。我们提出了一种无需模拟的数据增强策略,使用受物理启发的度量来生成类似于蛋白质折叠转变的测地线插值,从而在没有真实过渡态样本的情况下提高采样效率。这些新数据可用于提高基于分类器的方法的准确性。或者,可以通过利用插值进展参数来采用基于回归的CV模型学习方案。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验