Vargas-Rosales Pablo Andrés, Caflisch Amedeo
Department of Biochemistry, University of Zurich Winterthurerstrasse 190 8057 Zürich Switzerland
RSC Med Chem. 2025 Jan 23;16(4):1499-1515. doi: 10.1039/d4md00869c. eCollection 2025 Apr 16.
A long path has led from the determination of the first protein structure in 1960 to the recent breakthroughs in protein science. Protein structure prediction and design methodologies based on machine learning (ML) have been recognized with the 2024 Nobel prize in Chemistry, but they would not have been possible without previous work and the input of many domain scientists. Challenges remain in the application of ML tools for the prediction of structural ensembles and their usage within the software pipelines for structure determination by crystallography or cryogenic electron microscopy. In the drug discovery workflow, ML techniques are being used in diverse areas such as scoring of docked poses, or the generation of molecular descriptors. As the ML techniques become more widespread, novel applications emerge which can profit from the large amounts of data available. Nevertheless, it is essential to balance the potential advantages against the environmental costs of ML deployment to decide if and when it is best to apply it. For hit to lead optimization ML tools can efficiently interpolate between compounds in large chemical series but free energy calculations by molecular dynamics simulations seem to be superior for designing novel derivatives. Importantly, the potential complementarity and/or synergism of physics-based methods (, force field-based simulation models) and data-hungry ML techniques is growing strongly. Current ML methods have evolved from decades of research. It is now necessary for biologists, physicists, and computer scientists to fully understand advantages and limitations of ML techniques to ensure that the complementarity of physics-based methods and ML tools can be fully exploited for drug design.
从1960年确定首个蛋白质结构到蛋白质科学领域近期取得突破,走过了一条漫长的道路。基于机器学习(ML)的蛋白质结构预测和设计方法获得了2024年诺贝尔化学奖,但如果没有先前的工作以及众多领域科学家的投入,这些方法是不可能实现的。在将ML工具应用于预测结构集合以及在通过晶体学或低温电子显微镜进行结构测定的软件流程中使用这些工具方面,仍然存在挑战。在药物发现工作流程中,ML技术正被用于多个不同领域,如对接姿势评分或分子描述符的生成。随着ML技术变得更加广泛应用,出现了一些可以从大量可用数据中获益的新应用。然而,必须在ML部署的潜在优势与环境成本之间取得平衡,以决定是否以及何时应用它最为合适。对于从苗头化合物到先导化合物的优化,ML工具可以有效地在大型化学系列中的化合物之间进行插值,但通过分子动力学模拟进行的自由能计算在设计新型衍生物方面似乎更具优势。重要的是,基于物理的方法(如基于力场的模拟模型)与需要大量数据的ML技术之间的潜在互补性和/或协同作用正在迅速增强。当前的ML方法是经过数十年研究发展而来的。现在,生物学家、物理学家和计算机科学家有必要充分了解ML技术的优势和局限性,以确保基于物理的方法和ML工具的互补性能够在药物设计中得到充分利用。