Cavendish Laboratory, University of Cambridge, Cambridge, UK.
Development & Medical, Pfizer Worldwide Research, Groton, CT, USA.
Nat Commun. 2024 Jan 15;15(1):426. doi: 10.1038/s41467-023-42145-1.
Structural diversification of lead molecules is a key component of drug discovery to explore chemical space. Late-stage functionalizations (LSFs) are versatile methodologies capable of installing functional handles on richly decorated intermediates to deliver numerous diverse products in a single reaction. Predicting the regioselectivity of LSF is still an open challenge in the field. Numerous efforts from chemoinformatics and machine learning (ML) groups have made strides in this area. However, it is arduous to isolate and characterize the multitude of LSF products generated, limiting available data and hindering pure ML approaches. We report the development of an approach that combines a message passing neural network and C NMR-based transfer learning to predict the atom-wise probabilities of functionalization for Minisci and P450-based functionalizations. We validated our model both retrospectively and with a series of prospective experiments, showing that it accurately predicts the outcomes of Minisci-type and P450 transformations and outperforms the well-established Fukui-based reactivity indices and other machine learning reactivity-based algorithms.
对先导化合物进行结构多样化改造是探索化学空间的药物发现的关键组成部分。晚期功能化(LSF)是一种多功能方法,能够在丰富的修饰中间体上安装功能手柄,在单个反应中提供多种不同的产物。预测 LSF 的区域选择性仍然是该领域的一个开放性挑战。化学信息学和机器学习(ML)领域的许多努力在这方面取得了进展。然而,分离和表征生成的大量 LSF 产物是很困难的,这限制了可用数据并阻碍了纯 ML 方法的发展。我们报告了一种结合消息传递神经网络和基于 C NMR 的迁移学习的方法的开发,以预测 Minisci 和 P450 基功能化的原子级功能化概率。我们通过回顾性和一系列前瞻性实验验证了我们的模型,表明它可以准确预测 Minisci 型和 P450 转化的结果,并优于成熟的基于 Fukui 的反应性指数和其他基于机器学习的反应性算法。