Fleetwood Oliver, Kasimova Marina A, Westerlund Annie M, Delemotte Lucie
Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, Solna, Sweden.
Science for Life Laboratory, Department of Applied Physics, KTH Royal Institute of Technology, Solna, Sweden.
Biophys J. 2020 Feb 4;118(3):765-780. doi: 10.1016/j.bpj.2019.12.016. Epub 2019 Dec 21.
Biomolecular simulations are intrinsically high dimensional and generate noisy data sets of ever-increasing size. Extracting important features from the data is crucial for understanding the biophysical properties of molecular processes, but remains a big challenge. Machine learning (ML) provides powerful dimensionality reduction tools. However, such methods are often criticized as resembling black boxes with limited human-interpretable insight. We use methods from supervised and unsupervised ML to efficiently create interpretable maps of important features from molecular simulations. We benchmark the performance of several methods, including neural networks, random forests, and principal component analysis, using a toy model with properties reminiscent of macromolecular behavior. We then analyze three diverse biological processes: conformational changes within the soluble protein calmodulin, ligand binding to a G protein-coupled receptor, and activation of an ion channel voltage-sensor domain, unraveling features critical for signal transduction, ligand binding, and voltage sensing. This work demonstrates the usefulness of ML in understanding biomolecular states and demystifying complex simulations.
生物分子模拟本质上是高维的,会生成规模不断增大的噪声数据集。从数据中提取重要特征对于理解分子过程的生物物理特性至关重要,但仍然是一项巨大挑战。机器学习(ML)提供了强大的降维工具。然而,此类方法常被批评为类似于黑箱,人类可解释的洞察力有限。我们使用监督式和无监督式机器学习方法,从分子模拟中高效创建重要特征的可解释图谱。我们使用一个具有类似大分子行为特性的玩具模型,对包括神经网络、随机森林和主成分分析在内的几种方法的性能进行基准测试。然后,我们分析三个不同的生物过程:可溶性蛋白钙调蛋白的构象变化、配体与G蛋白偶联受体的结合以及离子通道电压传感器结构域的激活,揭示对信号转导、配体结合和电压传感至关重要的特征。这项工作证明了机器学习在理解生物分子状态和揭开复杂模拟神秘面纱方面的有用性。