Bhakat Soumendranath
Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania Pennsylvania 19104-6059 USA
RSC Adv. 2022 Sep 2;12(38):25010-25024. doi: 10.1039/d2ra03660f. eCollection 2022 Aug 30.
Understanding the kinetics and thermodynamics profile of biomolecules is necessary to understand their functional roles which has a major impact in mechanism driven drug discovery. Molecular dynamics simulation has been routinely used to understand conformational dynamics and molecular recognition in biomolecules. Statistical analysis of high-dimensional spatiotemporal data generated from molecular dynamics simulation requires identification of a few low-dimensional variables which can describe the essential dynamics of a system without significant loss of information. In physical chemistry, these low-dimensional variables are often called collective variables. Collective variables are used to generate reduced representations of free energy surfaces and calculate transition probabilities between different metastable basins. However the choice of collective variables is not trivial for complex systems. Collective variables range from geometric criteria such as distances and dihedral angles to abstract ones such as weighted linear combinations of multiple geometric variables. The advent of machine learning algorithms led to increasing use of abstract collective variables to represent biomolecular dynamics. In this review, I will highlight several nuances of commonly used collective variables ranging from geometric to abstract ones. Further, I will put forward some cases where machine learning based collective variables were used to describe simple systems which in principle could have been described by geometric ones. Finally, I will put forward my thoughts on artificial general intelligence and how it can be used to discover and predict collective variables from spatiotemporal data generated by molecular dynamics simulations.
了解生物分子的动力学和热力学概况对于理解其功能作用至关重要,这对基于机制的药物发现具有重大影响。分子动力学模拟经常被用于理解生物分子中的构象动力学和分子识别。对分子动力学模拟产生的高维时空数据进行统计分析,需要识别一些低维变量,这些变量能够描述系统的基本动力学而不会显著损失信息。在物理化学中,这些低维变量通常被称为集体变量。集体变量用于生成自由能表面的简化表示,并计算不同亚稳盆地之间的跃迁概率。然而,对于复杂系统而言,集体变量的选择并非易事。集体变量的范围从诸如距离和二面角等几何标准到诸如多个几何变量的加权线性组合等抽象标准。机器学习算法的出现导致越来越多地使用抽象集体变量来表示生物分子动力学。在这篇综述中,我将强调从几何集体变量到抽象集体变量等常用集体变量的几个细微差别。此外,我将提出一些案例,其中基于机器学习的集体变量被用于描述原则上可以由几何集体变量描述的简单系统。最后,我将阐述我对通用人工智能的看法,以及它如何用于从分子动力学模拟产生的时空数据中发现和预测集体变量。