NIH Center for Macromolecular Modeling and Bioinformatics, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA.
Biophysics Program, University of Maryland, College Park, Maryland 20742, USA.
J Chem Phys. 2020 Dec 21;153(23):234118. doi: 10.1063/5.0030931.
Artificial intelligence (AI)-based approaches have had indubitable impact across the sciences through the ability to extract relevant information from raw data. Recently, AI has also found use in enhancing the efficiency of molecular simulations, wherein AI derived slow modes are used to accelerate the simulation in targeted ways. However, while typical fields where AI is used are characterized by a plethora of data, molecular simulations, per construction, suffer from limited sampling and thus limited data. As such, the use of AI in molecular simulations can suffer from a dangerous situation where the AI-optimization could get stuck in spurious regimes, leading to incorrect characterization of the reaction coordinate (RC) for the problem at hand. When such an incorrect RC is then used to perform additional simulations, one could start to deviate progressively from the ground truth. To deal with this problem of spurious AI-solutions, here, we report a novel and automated algorithm using ideas from statistical mechanics. It is based on the notion that a more reliable AI-solution will be one that maximizes the timescale separation between slow and fast processes. To learn this timescale separation even from limited data, we use a maximum caliber-based framework. We show the applicability of this automatic protocol for three classic benchmark problems, namely, the conformational dynamics of a model peptide, ligand-unbinding from a protein, and folding/unfolding energy landscape of the C-terminal domain of protein G. We believe that our work will lead to increased and robust use of trustworthy AI in molecular simulations of complex systems.
人工智能(AI)技术在各科学领域中的应用已经产生了不可否认的影响,它能够从原始数据中提取相关信息。最近,人工智能也被用于提高分子模拟的效率,其中通过 AI 推导的慢模态来以有针对性的方式加速模拟。然而,虽然 AI 通常应用于数据丰富的领域,但分子模拟由于受到采样限制,其数据有限。因此,在分子模拟中使用 AI 可能会陷入一种危险的情况,即 AI 优化可能会陷入虚假的状态,从而导致对当前问题的反应坐标(RC)的不正确描述。当使用这样一个不正确的 RC 来执行额外的模拟时,就可能会逐渐偏离真实情况。为了解决这个由虚假 AI 解决方案带来的问题,我们在这里报告了一种新颖的自动化算法,该算法借鉴了统计力学的思想。其基本思想是,更可靠的 AI 解决方案将是最大化慢过程和快过程之间时间尺度分离的解决方案。为了即使在有限的数据中学习这种时间尺度分离,我们使用了基于最大口径的框架。我们展示了该自动协议在三个经典基准问题中的适用性,即模型肽的构象动力学、蛋白质上配体的解吸以及蛋白 G 的 C 末端结构域的折叠/去折叠能量景观。我们相信,我们的工作将促进在复杂系统的分子模拟中更广泛、更可靠地使用可信的 AI。