Brownless Alfie-Louise R, Rheaume Elisa, Kuo Katie M, Kamerlin Shina C L, Gumbart James C
Interdisciplinary Graduate Program in Quantitative Biosciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332, United States.
J Phys Chem B. 2025 Jun 5;129(22):5375-5385. doi: 10.1021/acs.jpcb.4c08824. Epub 2025 May 27.
Machine learning (ML) techniques have become powerful tools in both industrial and academic settings. Their ability to facilitate analysis of complex data and generation of predictive insights is transforming how scientific problems are approached across a wide range of disciplines. In this tutorial, we present a cursory introduction to three widely used ML techniques─logistic regression, random forest, and multilayer perceptron─applied toward analyzing molecular dynamics (MD) trajectory data. We employ our chosen ML models to the study of the SARS-CoV-2 spike protein receptor binding domain interacting with the receptor ACE2. We develop a pipeline for processing MD simulation trajectory data and identifying residues that significantly impact the stability of the complex.
机器学习(ML)技术已成为工业和学术领域的强大工具。它们促进复杂数据分析和生成预测性见解的能力正在改变跨广泛学科解决科学问题的方式。在本教程中,我们简要介绍三种广泛使用的ML技术——逻辑回归、随机森林和多层感知器——用于分析分子动力学(MD)轨迹数据。我们将所选的ML模型应用于研究严重急性呼吸综合征冠状病毒2(SARS-CoV-2)刺突蛋白受体结合域与受体血管紧张素转换酶2(ACE2)的相互作用。我们开发了一个处理MD模拟轨迹数据和识别对复合物稳定性有显著影响的残基的流程。